tinyML Talks - tinyMLPerf: Deep Learning Benchmarks for Embedded Devices

Hi Folks,

Tomorrow, I will be giving a talk on tinyMLPerf: Deep Learning Benchmarks for Embedded Devices. It is all about how do we systematically measure and assess machine learning performance on tinyML devices.

For all the learners who have taken edX courses, you should be curious to understand what goes on under the hood. Imagine in the future you want to know which device you should use for your ML task?! How would you go about figuring that out? Ah… that’s why you should come to the talk.

Please register (for free!) below if you’d like to learn more.

It will give you a bit of information into how different tinyML devices compare with one another etc.

Please give me feedback if learners would like to know more of these types of talks etc. The goal here is to help continue the online learning experience outside of edX!


While it is easy to benchmark tensorlite models via colab, what standards or can be used to benchmark device performance? ex:KWS on nano33: would the benchmark be an audio file that you play to its microphone? Will there be extensions to the code to create a confusion matrix or some measure?

Thank you @vjreddi for the very interesting talk. I look forward to watching the development of TinyMLPerf. I thought your point about the importance of building consensus at this stage vs. any specific technical detail was very strong.

There are so many interesting things to learn!

Hey @stephenbaileymagic,

Great that you could make the talk.

I am curious, do you find the email I sent out yesterday via edX useful? In the email, I mentioned a few opportunities to expand your knowledge, the tinyMLPerf talk being one of them. Then there’s the mention about joining the tinyML summit for free, etc. I don’t want to spam you folks, but I also feel compelled to help all of us gain exposure to new opportunities for expanding well beyond what the course has to offer.

Candid feedback is welcome :slight_smile:

1 Like

I can only speak for myself of course, but I appreciate the links to the other learning opportunities. The class was my first exposure to all of this, but the more I learn about the whole ecosystem of TinyML the more I like it. I am of the opinion that the more of the big picture I can see, the more I can do with each part of that picture.

Thanks again!

1 Like

Great question! You can certainly use the Arduino device we use in the course to run the tinyMLPerf benchmarks. That’s exactly what the benchmarking process is all about, running benchmarks on real devices and producing useful results that help us compare the devices.

As per the benchmark outlined by tinyMLPerf, the input samples are actually fed into the system under test (SUT). So the input can either be a spectrogram; we don’t feed time the pre-processing (I hope this resonates from Course 3).

Please make sure you enjoy the tinyML summit, I sent the link in the email via edX. It is a great way to see the wealth of things that are happening out in the industry over the course of a week – all free thanks to the tinyML foundation and all the members that support it, and they’ve been very supportive of all the learners in our course!


“So the input can either be a spectrogram; we don’t feed time the pre-processing.”

I am not sure what you mean. either…or? I poked around looking for a KWS benchmark in reference_submissions, the closest seemed mbed_basic but I couldn’t find anything for micro_speech

It seems that to have a truly solid benchmark, we would have to feed the device audio, maybe from a separate system. Then we would need to add code to the micro_speech .ino to build some kind of confusion matrix array into the micro_speech .ino for the benchmark session. We could make it pretty slim and read it out to the serial monitor. I know if we are allowed to tweak enough parameters we can predict most anything but still I’d like to be able to modify average window_duration, detection_threshold, suppression_ms and minimum_count from RecognizeCommands.

Benchmarking with spectrograms may seem cleaner but I suspect it not real-world enough and I don’t think I would believe it.

Hi, I’m a co-chair of TinyMLPerf and I can hopefully answer some questions.

It seems that to have a truly solid benchmark, we would have to feed the device audio

This is a great point. If we really want to understand the latency of the who system (sensor → action) we would need to include data capture in the benchmark. In practice, this is very difficult to implement in a comparable and consistent way (the hallmarks of a benchmark).

So instead we focus on neural network (NN) inference performance. While not the whole story, inference time is what most of our submitters and users of the benchmark care about. Most of the hardware and software vendors are focused on optimizing inference because it’s the new component in the data flow (we’ve been capturing sensor data for a while).

build some kind of confusion matrix

We do capture the output of the model and test the accuracy/AUC on device during the validation phase. It’s important to note that we primarily care about how long it takes to run inference and how much energy it consumes.

Benchmarking with spectrograms may seem cleaner but I suspect it not real-world enough and I don’t think I would believe it.

KWS is actually the outlier in the benchmark suite in that the pre-processing is also timed along with the NN inference. Therefore, we pass the audio files in one at a time and the system under test will pre-process and run inference on them and then report the latency and result.