If you have attempted to train a KWS spotting model in your native language (e.g. German, Tamil, or something else), would you please jot it down here? What’s the language and why? Just a sentence or two would be great.
Especially if you have taken Course 2 and Course 3, you know the issue of dataset limitations. So I’m trying to simply see how many people are trying to get the model trained in their own native language to raise awareness and support that we should all strive for democratizing speech technology for such things as KWS (first steps).
I tried to train the model to understand the words open and close in Greek (άνοιξε, κλείσε). I used 20 samples of each word that I gathered from 3 different persons (me, a teenager and a kid (both boys)). I also included the english words on and off from Pete’s dataset.
It wasn’t successful but I don’t know if it is mostly because of the dataset itself or the use of the greek alphabet in the filenames and consequently the labels. I’ll give it another try replacing the greek letters in the filenames with those from the english alphabet.
On and Off were well recognized.
I also participate in the Mozilla Common Voice project for my mother tongue.
Yea, we have definitely had issues with filenames. There is a regex that greps for the files, so could be an issue if it is ignoring some files.