A small machine learning design alleviates a bottleneck in memory usage on Internet of Things devices | MIT News

Machine learning provides powerful tools for researchers to identify and predict patterns and behaviors, as well as to learn, optimize, and complete tasks. These range from apps like vision systems on self-driving vehicles or social robots to smart thermostats to wearable and mobile devices like smartwatches and apps that can monitor health changes. Although these algorithms and their architectures are becoming increasingly powerful and efficient, they generally require huge amounts of memory, computation, and data to train and make inferences.

At the same time, researchers are working to reduce the size and complexity of devices on which these algorithms can run, down to a microcontroller unit (MCU) found in billions of internet devices. objects (IoT). An MCU is a memory-limited minicomputer housed in a compact integrated circuit that has no operating system and executes simple commands. These relatively inexpensive edge devices require low power, computation, and bandwidth, and offer plenty of opportunities to inject AI technology to extend their usefulness, increase privacy, and democratize their use — an area called TinyML.

Now, an MIT team working in TinyML in the MIT-IBM Watson AI Lab and the research group of Song Han, Assistant Professor in the Department of Electrical Engineering and Computer Science (EECS), has devised a technique to reduce the amount of memory needed even smaller, while improving its performance on image recognition in live videos.

“Our new technique can do so much more and paves the way for tiny machine learning on edge devices,” says Han, who designs the TinyML software and hardware.

To increase the efficiency of TinyML, Han and his colleagues from EECS and the MIT-IBM Watson AI Lab analyzed how memory is used on microcontrollers running various convolutional neural networks (CNNs). CNNs are biologically inspired models after brain neurons and are often applied to assess and identify visual features in imagery, such as a person walking in a video image. In their study, they discovered an imbalance in memory usage, causing front loading on the computer chip and creating a bottleneck. By developing a new inference technique and neural architecture, the team mitigated the problem and reduced peak memory usage by four to eight times. Additionally, the team deployed it on its own tinyML vision system, equipped with a camera and capable of detecting humans and objects, creating its next generation, dubbed MCUNetV2. Compared to other machine learning methods running on microcontrollers, MCUNetV2 has outperformed them with high sensing accuracy, opening doors to additional vision applications not previously possible.

The results will be presented in a paper at the Neural Information Processing Systems (NeurIPS) conference this week. The team includes Han, lead author and graduate student Ji Lin, postdoc Wei-Ming Chen, graduate student Han Cai, and MIT-IBM Watson AI Lab research scientist Chuang Gan.

A design for efficiency and memory redistribution

TinyML offers many advantages over deep machine learning that occurs on larger devices, like remote servers and smartphones. These, Han notes, include privacy, since the data is not transmitted to the cloud for computation but processed on the local device; robustness, because the calculation is fast and the latency is low; and low cost, as IoT devices cost around $1-2. Additionally, some larger, more traditional AI designs can emit as much carbon as five cars in their lifetime, require lots of GPUs, and cost billions of dollars to train. “So we believe that these TinyML techniques can allow us to disconnect from the grid to save carbon emissions and make AI greener, smarter, faster and also more accessible to everyone – to democratize AI,” says Han.

However, the MCU’s small memory and digital storage limit AI applications, so efficiency is a central challenge. The MCUs contain only 256 kilobytes of memory and 1 megabyte of storage. By comparison, mobile AI on smartphones and cloud computing, accordingly, can have 256 gigabytes and terabytes of storage, as well as 16,000 and 100,000 times more memory. As a valuable resource, the team wanted to optimize its use, so they profiled the MCU memory usage of CNN designs — a task that had been overlooked until now, Lin and Chen say.

Their findings revealed that memory usage peaked with the first five convolutional blocks out of about 17. Each block contains many connected convolutional layers, which help filter the presence of specific features in an input image or video. , creating a feature map like the output. During the initial memory-intensive phase, most blocks worked beyond the 256 KB memory constraint, providing plenty of room for improvement. To reduce the memory spike, the researchers developed a patch-based inference program, which only works on a small fraction, about 25%, of the layer’s feature map at a time, before switching to the next quarter, until the entire layer is complete. . This method saved four to eight times the memory of the previous layer-by-layer calculation method, without any latency.

“For example, let’s say we have a pizza. We can divide it into four pieces and only eat one at a time, saving you about three quarters. This is the patch-based method of inference,” says Han. “However, it was not a free lunch.” Like the human eye’s photoreceptors, they can only pick up and review part of an image at a time; this receptive field is a fragment of the total image or of the field of vision. As the size of these receptive fields (or slices of pizza in this analogy) increases, there is increasing overlap, which equates to a redundant calculation that the researchers found at around 10%. The researchers proposed to equally redistribute the neural network across the blocks, parallel to the patch-based inference method, without losing any of the precision of the vision system. However, the question remained as to which blocks needed the patch-based inference method and which could use the original layer-by-layer method, as well as redistribution decisions; adjusting all those buttons manually was a lot of work and best left to the AI.

“We want to automate this process by performing a joint automated optimization search, including both the architecture of the neural network, such as the number of layers, the number of channels, the core size, as well as the inference schedule. , including patch count, layer count for patch-based inference, and other optimization buttons,” Lin explains, “so that non-machine learning experts can have a solution to push button to improve computational efficiency but also improve engineering productivity, to be able to deploy this neural network on microcontrollers.

A new horizon for tiny vision systems

Network architecture co-design with neural network search optimization and inference planning provided significant gains and was adopted in MCUNetV2; it outperformed other vision systems in terms of maximum memory usage, detection and classification of images and objects. The MCUNetV2 device includes a small screen, a camera, and is about the size of a headphone case. Compared to the first version, the new version required four times less memory for the same precision, says Chen. When placed against other tinyML solutions, MCUNetV2 was able to detect the presence of objects in images, such as human faces, with an improvement of almost 17%. Additionally, it set an accuracy record, at nearly 72%, for a thousand-class image classification on the ImageNet dataset, using 465 KB of memory. The researchers tested what are called visual wake words, the ability of their MCU vision model to identify the presence of a person in an image, and even with the limited memory of only 30 KB, it achieved more than 90% accuracy, beating the previous state. – method of art. This means that the method is sufficiently accurate and could be deployed to help, for example, smart home applications.

With high accuracy and low power consumption and low cost, MCUNetV2’s performance unlocks new IoT applications. Due to their limited memory, Han says, vision systems on IoT devices were previously considered only good for basic image classification tasks, but their work has helped expand the possibilities for using TinyML. Additionally, the research team envisions it in many areas, from monitoring sleep and joint movement in the healthcare industry to sports coaching and movements like a golf swing to identifying plants. in agriculture, as well as in smarter manufacturing, from identifying nuts and bolts to detect faulty machinery.

“We’re really pushing these real-world applications to scale,” says Han. “Without a GPU or specialized hardware, our technique is so small that it can run on these cheap little IoT devices and run real-world applications like these visual wake words, face mask detection, and person detection. This opens the door to a whole new way of doing tiny AI and mobile vision.

This research was sponsored by the MIT-IBM Watson AI Lab, Samsung, Woodside Energy, and the National Science Foundation.

Abdul J. Gaspar