Providing Higher Performance Solutions for Fast-Growing Edge AI Applications
Providing Higher Performance Solutions for Fast-Growing Edge AI Applications
Edge AI applications such as presence detection and object counting are growing in popularity, but designers are increasingly demanding low power and small form factor edge AI solutions without compromising performance. The latest release of Lattice’s sensAI technology portfolio, available for ECP5 and iCE40 UltraPlus FPGAs, provides designers with the hardware platform, IP, software tools, reference designs, and design services needed to enable low power, high performance AI at the edge.
Abstract
The market for low-cost, high-performance edge solutions is becoming increasingly competitive. Leading market research firms predict that the market for edge solutions will explode over the next six years. IHS predicts that by 2025, there will be more than 40 billion devices operating at the edge, while market intelligence firm Tractica predicts that more than 2.5 billion edge devices will be shipped annually by then.
As a new generation of edge applications emerges, designers are increasingly looking to develop solutions that combine low power and small form factor without compromising performance. Driving these new AI solutions are a growing number of edge applications, such as presence detection for smart doorbells and security cameras in home control, object counting for inventory in retail applications, and object and presence detection in industrial applications. On one hand, the market is demanding that designers develop solutions with higher performance than ever before. On the other hand, latency, bandwidth, privacy, power, and cost issues limit their reliance on cloud computing resources to perform analytics.
At the same time, performance, power, and cost constraints vary by application.
As the data demands of real-time edge applications continue to drive demand for cloud-based services, designers must address traditional power, board area, and cost issues. How can developers address the increasingly stringent system requirements for power consumption (milliwatts) and small size (5 mm2 to 100 mm2). The performance requirements alone are difficult to meet.
Leveraging FPGA Advantages
Lattice FPGAs are uniquely positioned to meet the rapidly changing market requirements for edge devices. One way designers can quickly provide more computing resources to edge devices without relying on the cloud is to use the inherent parallel processing capabilities in FPGAs to accelerate neural network performance. In addition, by using low-density, small-footprint FPGAs optimized for low-power operation, designers can meet the stringent power and size constraints of new consumer and industrial applications. For example, Lattice’s iCE40 UltraPlus™ and ECP5™ product families support the development of network edge solutions with power consumption as low as 1 mW to 1 W and hardware platform sizes as small as 5.5 mm2 to 100 mm2. By combining ultra-low power, high performance and high accuracy with comprehensive traditional interface support, these FPGAs provide network edge device developers with the flexibility they need to meet changing design requirements.

Figure 1: Lattice Semiconductor’s low-power, small-footprint FPGAs provide the right combination of performance and features to support network edge artificial intelligence applications
To meet this demand and accelerate development, Lattice has introduced sensAI™, the industry’s first technology collection, to provide designers with all the tools they need to develop low-power, high-performance network edge devices in smart homes, smart factories, smart cities and smart cars. Designed to meet the growing demand for AI-enabled network edge devices, sensAI provides comprehensive hardware and software solutions for implementing low-power, always-on AI capabilities in smart devices running at the network edge. Launched in 2018, it is designed to seamlessly create new designs or update existing ones, with low-power AI inference optimized for these new application requirements.
What’s in this comprehensive design ecosystem?
First, Lattice’s modular hardware platforms, such as the iCE40 UPduino 2.0 with HM01B0 Shield development board and the ECP5-based Embedded Vision Development Kit (EVDK), provide a solid foundation for application development. UPduino can be used for AI designs that require only a few milliwatts, while the EVDK supports applications that require higher power but typically operate below 1W.
Soft IP can be easily instantiated into FPGAs to accelerate the development of neural networks.
As a result, the sensAI development kit includes CNN accelerator IP, which enables designers to implement deep learning applications in iCE40 UltraPlus FPGAs. sensAI also provides complete CNN parameterizable accelerator IP cores that can be implemented in Lattice’s ECP5 FPGAs. These IPs support variable quantization. This in turn enables designers to make trade-offs between data accuracy and power consumption.
Lattice’s sensAI technology collection allows designers to explore design options and trade-offs through an easy-to-use tool flow. Designers can use industry-standard frameworks such as Caffe, TensorFlow, and Keras for network training. The development environment also provides a neural network compiler that maps the trained network model to a fixed-point representation, supporting variable quantization of weights and activations. Designers can use the compiler to help analyze, simulate, and compile different types of networks for implementation on Lattice’s accelerator IP cores without RTL experience. Designers can then use traditional FPGA design tools such as Lattice Radiant and Diamond to implement the entire FPGA design.
To speed up design implementation, sensAI provides an increasing number of reference designs and demonstrations
. Including facial recognition, gesture detection, keyword detection, presence detection, face tracking, object counting, and speed sign detection. Finally, design teams usually need a certain amount of expertise to complete the design. To meet this need, Lattice has established partnerships with a number of design service partners around the world to support customers with insufficient AI/ML expertise.

Figure 2: Lattice senseAI is a complete hardware and software solution for developing edge AI applications
Major Updates
To meet the rapidly growing performance requirements of edge AI, Lattice released an update to sensAI in 2019 that enhanced its performance and optimized the design process. The updated sensAI has a 10x performance improvement over the previous version, which is enabled by multiple optimizations, including optimized memory access through updates to the CNN IP and neural network compiler, new features such as 8-bit activation quantization, smart layer merging, and dual DSP engines.
In the latest version, the memory access sequence is greatly optimized due to the update of the neural network compiler to support 8-bit input data. As a result, not only the access to external memory is reduced by half, but also higher resolution images can be used as data input. With higher resolution images, the solution is naturally more accurate.
To further accelerate performance, Lattice optimized the convolutional layers in the sensAI neural network to reduce the time spent on convolution calculations. Lattice doubled the number of convolution engines in the device, reducing the convolution time by about 50%.
Lattice has increased the performance of sensAI without increasing power consumption, allowing designers to select lower gate count devices in the ECP5 FPGA product family. Optimized demo examples can help achieve the performance gains. For example, the people detection demo, optimized for low power operation, uses a CMOS image sensor and provides a resolution of 64 x 64 x 3 using a VGG8 network. The system runs at 5 frames per second and consumes only 7 mW using an iCE40 UltraPlus FPGA. The second performance-optimized demo, for a people counting application, also uses a CMOS image sensor and provides a resolution of 128 x 128 x 3 using a VGG8 network. The demo runs at 30 frames per second and consumes 850 mW using an ECP5-85K FPGA.

Figure 3: These reference designs show the power and performance options available with sensAI
At the same time, sensAI provides a seamless design experience for users, supporting more neural network models and machine learning frameworks, thereby shortening design cycles. New customizable reference designs simplify the development of common edge solutions such as object counting and presence detection, while a growing ecosystem of design partners is providing important design services. With these, Lattice provides developers with all the key tools they need to replicate or adapt their designs. For example, the following block diagram shows the comprehensive set of components provided by Lattice, including training models, training datasets, training scripts, updated neural network IP and neural network compilers.
.

Figure 4: The sensAI design flow includes industry-leading machine learning frameworks, training data and scripts, neural network IP and other resources necessary to design and train edge AI devices
Lattice has also expanded its support for machine learning frameworks to provide a seamless user experience. The initial version of sensAI supports Caffe and TensorFlow, and the latest version adds support for Keras, an open source neural network written in Python that can run on TensorFlow, Microsoft Cognition Toolkit or Theano. Keras is designed to help engineers quickly implement deep neural networks. It can provide a user-friendly, modular and extensible environment to accelerate prototyping. Keras was originally conceived as an interface rather than a standalone machine learning framework. Its high level of abstraction allows developers to accelerate the development of deep learning models.
To further simplify use, Lattice has updated the sensAI neural network compiler tool, which automatically selects the most accurate fractional digits when converting machine learning models to firmware files. The sensAI update also adds a hardware debugging tool that allows users to read and write at each layer of the neural network. After software simulation, engineers also need to know how their network performs on actual hardware. Using this tool, engineers can see the results of hardware operation in just a few minutes.
In addition, the latest version of sensAI has been supported by a growing number of companies who provide Lattice with design services and product development skills optimized for low-power, always-on network edge devices. These companies help customers build network edge AI devices by seamlessly updating existing designs or developing complete solutions for specific applications.
sensAI Design Cases
This new, higher-performance solution from Lattice can be used in the following four different accelerator design cases. In the first design case (Figure 5), design engineers use sensAI to build a solution for standalone operation mode. This system architecture allows designers to develop an integrated, always-on, low-latency, and highly secure solution on a Lattice iCE40 UltraPlus or ECP5 FPGA, where FPGA resources can be used for system control. A typical application is the use of standalone sensors for people detection and counting.

Figure 5: sensAI as a standalone edge AI processing solution
Designers also used sensAI to develop two different types of pre-processing solutions. In the first case (Figure 6), designers used Lattice sensAI and a low-power iCE40 UltraPlus FPGA to pre-process sensor data, thereby minimizing the cost of transmitting data to the SoC or cloud for analysis. For example, if used in a smart doorbell, sensAI will initially read the data from the image sensor. If it is determined not to be a person, such as a cat, the system will not wake up the SoC or connect to the cloud for further processing. Therefore, this approach can minimize data transmission costs and power consumption. If the pre-processing system determines that the object at the door is a person, it wakes up the SoC for further processing. This can significantly reduce the amount of data that the system needs to process while reducing power requirements, which is critical for always-on edge network applications.

Figure 6: In this case, sensAI pre-processes sensor data to determine whether the data needs to be sent to the SoC for further processing
In the second pre-processing application, designers can use the ECP5 FPGA to accelerate the neural network (Figure 7). In this case, designers use the flexibility of the ECP5 IO to connect a variety of existing on-board devices (such as sensors) to a low-end MCU for highly flexible system control.
Figure 7: The second system architecture also uses pre-processing, and designers can use ECP5 and sensAI to pre-process sensor data to enhance the overall performance of the neural network
Designers can also use sensAI accelerators in post-processing systems (Figure 8). More and more design cases show that many companies have developed proven MCU-based solutions, but they want to add some AI capabilities without replacing components or redesigning. However, in some cases, their MCU performance is relatively insufficient. A typical example is a smart industrial or smart home application that requires image filtering before analysis. Designers can add another MCU here and go through a time-consuming design verification process, or they can add an accelerator between the MCU and the data center for post-processing, minimizing the amount of data sent to the cloud. This approach is particularly attractive to IoT device developers who want to add AI capabilities.
Figure 8: This MCU-based design is enhanced with sensAI to enable existing designs to support edge AI capabilities
Conclusion
It is clear that the next few years will be a critical period for the development of the market for always-on edge smart devices. As applications become more complex, designers will need tools that can support higher performance at low power. The latest version of Lattice sensAI technology, combined with ECP5 and iCE40 UltraPlus FPGAs, will provide designers with hardware platforms, IP, software tools, reference designs, and design services to help them beat their competitors and quickly develop successful solutions.






