Reinforcement learning on Micro-controllers?


I am not sure if any work done by the research community on running reinforcement learning algorithms on micro-controllers. Do you have any information to share on this topic please?

1 Like

I`m on the same boat with you. Actually, it’ll be hard to find it, I think. If you understand the fundamental of reinforcement learning, we need some ways to express value function. If we want to use dynamic programming, it may be impossible due to the performance restriction of MCU. Alternatively, we can use non-linear function approximation such as neural network. In this case, we need lots of data for training function approximator.

And one more thing to consider is that why do we need RL on MCU. The concept of RL is learning environment dynamics from trial-and-error. That is, RL on MCU requires trial and error framework to build the model. AFAIK, tensorflow lite for microcontroller(TFLM) doesn’t support on-device training for weight update, just only for inference.

Just my opinion, but who knows what will be happened in the future? :slight_smile:


Thank you for the interesting question! RL on a MCU hasn’t been explored much in literature, but some papers have shown it can be useful in the area of robotics.

This first example,, shows how a policy trained using reinforcement learning can be used on a mcu for low-level control of a tiny drone. It’s a very small network, that replaces the PID controllers that are normally used to keep the drone in hover.

A second example, [1909.11236] Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller, a project I have worked on, implements a deep-RL policy for high-level control of a tiny drone that can seek a light source. The policy decides where to go next (high-level control), instead of how to reach a certain velocity or position (low-level control).

From our experience, implementing a capable deep-RL policy onboard a MCU can be challenging due to the severe resource constraints. The TensorFlow Lite and uTensor inference libraries that we tried caused too much overhead to run reliably onboard a robot. Instead we opted for a custom lightweight c library for inference, that allowed our policy to run at up to 100Hz.

Depending on the MCU and model deployed, you will run into different bottlenecks - definitely memory in our case. Training a deep-RL policy on a MCU would be hard (really interesting though), because of the resource constraints. For inference of a deep-RL policy you may expect similar challenges to those experienced in the TinyML community, only now physical aspects play a role too - like the overall weight of the system.

Let me know if you have any more questions or need help with implementing RL on a MCU!

Thanks @chanseok and @Duisterhof