Hi everyone. I decided to write and build my model from scratch with tf2.4. I completed the architecture few days ago and I trained my model using the Speech commands dataset including some commands such as Lights and Fan and Heater… But my accuracy didn’t go up more than 88% no matter how much of the hyperparams I tune. Then I looked at the Exmaples code implementation on tensorflow’s repository. I realized they mixed the Background Noises with the datas(augmentation I guess) which I excluded. Could this be some reason I am not going up in accuracy. I’m still trying to understand mixing background and clamping the volumes so I can’t include just yet. However, both MFCC and Frontend_op micro process didn’t get me up to 90%. Also, I shuffled and split the whole dataset into training and validation rather than making sure that they contain a certain number and I’m picking batch of train set randomly instead of deterministic as implemented in the example. Could this be some reason I’m not going high in accuracy.
Hi @predstan I would look at a few items to begin debugging this issue. First, how many commands do you have in total? The tiny_conv model is very small, and its discriminative power lowers as you increase the number of categories. If you train a model with just 2 commands, does accuracy go up? Also, take a look at your confusion matrix - are certain keywords being misclassified as silence more than others? That might indicate an issue with those recordings. I would also suggest listening back to some of the misclassified examples and seeing if they are distinct in some way to the correctly classified examples (e.g., whether they are noisier).
Are you looking at training or validation accuracy? I would expect validation accuracy to improve if you include background noise and other forms of augmentation such as timeshifts (this helps the model becomes more generalized to data outside of the training set). I think randomly splitting the dataset should not make a significant difference, as long as you ensure there are roughly the same number of samples for each command across your train/val/test splits.
Thank You for this response. I was training for all of the commands (38) with 3 conv2D layers. I’m trying to understand the background clamping before deciding to mix those background noises. Should I mix with training set only or also mix with validation and testing.
I was looking at the validation accuracy which is 88% at the maximum that I ever got. I was just trying to push it further up to atleast 95%. Training Accuracy has consistently achieved 90% but not on validation.
38 categories is a lot, and that can make high accuracies across all categories more challenging to achieve. You might want to look at top-5 accuracy as well to aid in your investigations.
I would suggest only applying augmentation to your training data.
Thank you very Much for your response. I have finally achieved some intereSting accuracy on the validation set, I am up 90% on validation with the 38labels after 17000 epochs. My model implementation wasn’t as tiny as the course’s though. But I will be pruning and retraining soon.