Based on @petewarden and @vjreddi 's advice in the Getting Started section to just get something up and running, I have installed the micro_speech example on my Nano BLE sense and combined it with some odds and ends to make it a little easier for me to see from across the room.
Just playing with this has already taught me some valuable lessons about the real world performance of this model and hardware. For example, the response does not depend on how loud I am when I’m speaking to the device. It does not understand me better when I raise my voice
I am having great fun with this and I am already working on collecting data for a personal keyword to train the model on. I know this is nothing new for many of you, but I hope it provides a little inspiration for those like me who are just entering the TinyML world.
Very nice! Keep chugging along. In course 3, we will get into some interesting aspects around KWS, KWS+VWW (“sensor fusion”) and then talk about IMU related examples. Loads of fun stuff coming through.
Feel free to share your updates, folks will continue to thrive and learn.
I wouldn’t mind the machine saying that (i.e., don’t raise your voice) to my kid Sometimes my kid can be “rude” to the Alexa/Google voice machine when they don’t hear her, and I worry what longterm effects that might have on her development. These machines are going to be pervasive and all around us, especially around kids.
KWS machines should come with a parental mode that we can enable. I want the machine to to analyze the sentiment and respond accordingly. Cause more often than not, it is me telling my kid to say the Keyword more nicely, and not scream at it angrily just cause the machine didn’t pick her voice up the first time
Not just the sentiment @vjreddi, these machines need to be trained emotions. Responses from my Alexa are very bland. Even while reading out news or jokes, there is no expression or emotion.
I look forward to seeing what will be done in the future in regards to sentiment and emotion in our interactions with these devices. So far I can’t help feeling that the device is training me how to speak to it as much as I am training it to understand me!
@moulipc
Absolutely! I can’t agree more with what you just said. This is so much relatable. I too feel that the developers should reconsider a few things on the ai’s NTTS tech and its adaptability systems. But again a lot more dataset would be required for ai s like Alexa to perform to our expectations. What do you feel should be done to make such programs more human-like?
As mentioned above, I am attempting to build on top of the micro_speech example provided with the Nano BLE board by creating a personal keyword that will work in this application. I intend to follow the full pipeline from data collection to model training to deployment on this device.
My key observation so far:
Though I was warned of this clearly and repeatedly in the TinyML course, I am still surprised how much effort is required in getting my data set together. I am building on top of the speech_commands data set, and adding one personal keyword, so I thought it would be straight forward. However, I have hit several instances where I realized that my collection/preparation efforts were not as well thought out as they needed to be.
I have also realized that I am going to have to do some model training to get any idea of the quality of my data set. It is quite likely that I will have to do some adjustments based on my early results. Though I have made an effort to add variety to the recordings of my new keyword (different background noises, different styles of speaking, etc.) it is still just one person speaking, so I am not sure how that will play out in the training and real world performance of my model. Time will tell.
I must give my thanks to all those who have worked to prepare the excellent TinyML course, and, of course, Pete Warden and all those at Google for sharing their work. They have laid out the path for me to follow and I appreciate it very much.
What you are doing is exactly what we will be doing in the “Custom Dataset Generation” module in Course 3! So you are away ahead of the league Awesome to hear that you are trying it out on your own.
Some of the directions we provide will hopefully be useful to you, and hey for all we know… you might have some interesting pointers to contribute – when we release the labs, please don’t hesitate to contribute back with your feedback and suggestions. The more we contribute back, the more we can make it accessible to everyone!
What’s the use case you are going after? Just curious.
Mostly this project is just an exercise to help me really absorb some of what we have studied so far. It is very important to me to be able to use this information on a practical level as well as knowing it academically.
That having been said, this is going to be the first generation of the “Abracadabra” application. The intent is that someone will pick a card, I will say “abracadabra” and the device will tell them the name of their card.
So wait… just want to make sure I understand. What is the part where you need ML? How will the device be able to “see” the card? Or is the idea that somehow the device has to “predict” the card?
Well, I don’t want to spoil the magic, but the idea is that I would use my knowledge of magic to learn the card and then cue the device to announce it in an interesting fashion.
The idea of a “magical assistant” that can be seen or unseen and can respond to subtle audible or visible cues by launching a scripted action is extremely powerful. For example, saying “ABracadabra” and playing one recording vs. saying “abracaDAbra” and playing a different recording is already the basis for a powerful theatrical experience.
And in the long run, if I am able to learn how to effectively and efficiently train models to respond to a personalized set of commands/gestures at a high confidence level, I thought it might be good for people with physical or verbal challenges to be able to create personal command sets that are comfortable for them and could be used to interface with other devices.
But these are dreams for down the road. For now, I’ve got a lot to learn!
Still working on preparing my dataset for training. I’ve made some decisions based on best guesses about how the training will come out and some for practical reasons. For example, I am taking subsets of each word in the speech_command dataset rather than using the entire collection.
I am also adding additional content to the background noise collection in the dataset. I am taking recordings around my home, and intend to include some additional recordings which include music and people speaking in the background. In these cases, I am being careful about the licensing. As mentioned in the Course, I don’t want to spoil my dataset with incorrectly licensed content.
The reason for the focus on background noise is that I have been running the micro_speech example on the Nano BLE in my home for extended periods and taking note of what causes false positives/false negatives on the current keywords (“yes”, “no” and “unknown”). I have been a bit surprised at what has caused these results. For example, certain types of music seem to trigger a lot of false “no” readings regardless of the lyrics. I am hopeful that exposing the model to a wider variety of sounds in addition to the words will create a more discerning model, especially in regards to background noise which is common in my home.
Overall, I wish my progress was faster, but I am really enjoying what I am learning as I go along!
It works! I am happy to say that I have successfully recorded my own keyword, combined it with a subset of the speech_commands dataset, trained a model, converted it to TFLite Micro and deployed it on the Nano BLE board with the micro_speech example code.
There is a LOT of room for improvement and fine tuning, but for the moment, I am happy that I can say my own keyword and the led on the board lights up.
Many many thanks to @vjreddi and the course staff as well as all those working on Tensorflow who have made it possible for me to get this far.
I hope my accomplishment serves as a small inspiration for anyone working on their own projects or for those who hope to begin one.
You are one step ahead of everyone, as in Course 3 we get folks to do pretty much what you already did. The course is kicking off tomorrow – hooray! Hope you pick up some new things in there. Looking forward to hearing your thoughts.
Have fun!
(sorry for being MIA, been busy getting Course 3 packaged up)
The keyword is “abracadabra” I also trained against the word “off” as I think it might be useful to have a word which could cancel actions triggered by the keyword.
I am happy with getting the chip to respond to my word, but this project raised many questions so I am very excited to see what Course 3 will bring. Thank you again for your encouragement!