Off-Prem

Edge + IoT

MIT boffins cram ML training into microcontroller memory

Neat algorithmic trick squeezing into 256KB of RAM, barely enough for inference let alone teaching


Researchers claim to have developed techniques to enable the training of a machine learning model using less than a quarter of a megabyte of memory, making it suitable for operation in microcontrollers and other edge hardware with limited resources.

The researchers at MIT and the MIT-IBM Watson AI Lab say they have found "algorithmic solutions" that make the training process more efficient and less memory-intensive.

The techniques can be used to train a machine learning model on a microcontroller in a matter of minutes, it is claimed, and they have produced a paper on the subject, titled "On-Device Training Under 256KB Memory" [PDF].

According to the authors, on-device training of a model will enable it to adapt in response to new data collected by the device's sensors. By training and adapting locally at the edge, the model can learn to continuously improve its predictions for the life of the application.

However, the problem with implementing such a solution is that edge devices are often constrained in their memory size and processing power. At one end of the scale, tiny IoT devices based on microcontrollers may have as little as 256KB of SRAM, the paper states, which is barely enough for the inference work of some deep learning models, let alone the training.

Meanwhile, deep learning training systems like PyTorch and TensorFlow are often run on clusters of servers with gigabytes of memory at their disposal, and while there are edge deep learning inference frameworks, some of these lack support for the back-propagation to adjust the models.

In contrast, the intelligent algorithms and framework that the researchers have developed is able to reduce the amount of computation required to train a model, it is claimed.

This is no mean feat, since training a typical deep learning model undergoes hundreds of updates as it learns, and because there may be millions of weights and activations involved, training a model requires much more memory than running a pre-trained model.

(That said, if there are similar projects out there doing non-trivial training on microcontroller devices, let us know.)

One of the MIT solutions developed to make the training process more efficient is sparse update, which skips the gradient computation of less important layers and sub-tensors by using an algorithm to identify only the most important weights to update during each round of training.

The algorithm works by freezing the weights one at a time until it detects the accuracy dip to a set threshold. The remaining weights are then updated, while the activations corresponding to the frozen weights do not need to be stored.

"Updating the whole model is very expensive because there are a lot of activations, so people tend to update only the last layer, but as you can imagine, this hurts the accuracy," explained MIT Associate Professor Song Han, one of the paper's authors. "For our method, we selectively update those important weights and make sure the accuracy is fully preserved," he added.

The second solution is to reduce the size of the weights using quantization, typically from 32 bits to just 8 bits, to cut the amount of memory needed for both training and inference. Quantization-aware scaling (QAS) is then used to adjust the ratio between weight and gradient, to avoid any drop in accuracy that may result from training with the quantized values.

The system changes the order of steps in the training process so more work is completed in the compilation stage, before the model is deployed on the edge device, according to Han.

"We push a lot of the computation, such as auto-differentiation and graph optimization, to compile time. We also aggressively prune the redundant operators to support sparse updates. Once at runtime, we have much less workload to do on the device," he said.

The final part of the solution is a lightweight training system, Tiny Training Engine (TTE), that implements these algorithms on a simple microcontroller.

According to the paper, the framework is the first machine learning solution to enable on-device training of convolutional neural networks with a memory budget of less than 256KB.

The authors say that the training system has been demonstrated operating on a commercially available microcontroller, an STM32F746 based on an Arm Cortex-M7 core with 320KB of SRAM and produced by STMicroelectronics.

This was used to train a computer vision model to detect people in images, which it was able to successfully complete after just 10 minutes of training, the research states.

With this success under their belt, the researchers now say they want to apply what they have learned to other machine learning models and types of data, such as language models and time-series data.

They believe these techniques could be used to shrink the size of larger models without sacrificing accuracy, which could help reduce the carbon footprint of training large-scale machine-learning models in future. ®

Send us news
6 Comments

UK unveils plans to mainline AI into the veins of the nation

Government adopts all 50 venture capitalist recommendations but leaves datacenter energy puzzle unsolved

Where does Microsoft's NPU obsession leave Nvidia's AI PC ambitions?

While Microsoft pushes AI PC experiences, Nvidia is busy wooing developers

Microsoft eggheads say AI can never be made secure – after testing Redmond's own products

If you want a picture of the future, imagine your infosec team stamping on software forever

Just as your LLM once again goes off the rails, Cisco, Nvidia are at the door smiling

Some of you have apparently already botched chatbots or allowed ‘shadow AI’ to creep in

Microsoft, PC makers cut prices of Copilot+ gear in Europe, analyst stats confirm

Double-digit reduction only served to 'stimulate some interest'

Additional Microprocessors Decoded: Quick guide to what AMD is flinging out next for AI PCs, gamers, business

Plus: A peek at Nvidia's latest hype

Schneider Electric warns of future where datacenters eat the grid

Report charts four scenarios from 'Sustainable AI' to 'Who Turned Out The Lights?'

In AI agent push, Microsoft re-orgs to create 'CoreAI – Platform and Tools' team

Nad lad says 30 years of change happening in 3 years ... we're certainly feeling the compression of time

AI datacenters putting zero emissions promises out of reach

Plus: Bit barns' demand for water, land, and power could breed 'growing opposition' from residents

Enterprises in for a shock when they realize power and cooling demands of AI

Energy consumption set to become a key performance indicator by 2027

Megan, AI recruiting agent, is on the job, giving bosses fewer reasons to hire in HR

She doesn't feel pity, remorse, or fear, but she'll craft a polite email message as she turns you down

UK businesses eye AI as the cheaper, non-whining alternative to actual staff

Rising costs blamed, although any excuse to do more with less