Have you trained a KWS model in your native language?

vjreddi · March 21, 2021, 8:17pm

Hi,

If you have attempted to train a KWS spotting model in your native language (e.g. German, Tamil, or something else), would you please jot it down here? What’s the language and why? Just a sentence or two would be great.

Especially if you have taken Course 2 and Course 3, you know the issue of dataset limitations. So I’m trying to simply see how many people are trying to get the model trained in their own native language to raise awareness and support that we should all strive for democratizing speech technology for such things as KWS (first steps).

gkapsid · April 1, 2021, 8:10pm

I tried to train the model to understand the words open and close in Greek (άνοιξε, κλείσε). I used 20 samples of each word that I gathered from 3 different persons (me, a teenager and a kid (both boys)). I also included the english words on and off from Pete’s dataset.
It wasn’t successful but I don’t know if it is mostly because of the dataset itself or the use of the greek alphabet in the filenames and consequently the labels. I’ll give it another try replacing the greek letters in the filenames with those from the english alphabet.
On and Off were well recognized.
I also participate in the Mozilla Common Voice project for my mother tongue.

vjreddi · April 2, 2021, 1:19pm

Yea, we have definitely had issues with filenames. There is a regex that greps for the files, so could be an issue if it is ignoring some files.

Topic		Replies	Views
Adventures in Dataset Engineering Projects	12	545	April 14, 2021
Revised Jupyter Notebook for Course 3, Section 1.5, TinyML Custom Keyword Spotting (KWS) Education	3	333	February 12, 2026
How can I use a Mozilla Common Voice Dataset? General Discussions	5	1451	April 7, 2021
A very simple starter project Projects	15	706	March 1, 2021
Looking for single word/ voice command dataset General Discussions	1	193	January 20, 2022

Have you trained a KWS model in your native language?

Related topics