Looking for help in the community

Hello all,

A colleague (@Danimp94) and I are working on a TinyML project to take part ideally in the HardvardX competition. We find ourselves a little bit stuck and we would appreciate to receive some help from this community. We also consider our questions are possibly common questions and discussing about them might be helpful to other TinyML students in this forum.

First of all, let us introduce what is our project about. We would like to append snippets of code for a better explanation, but because this post is long enough, we attach the GIT repo directly:

Project description

In this project we are aiming to develop an intelligent alarm using TinyML that wakes up the user at the best moment (i.e. the REM stage). The idea behind this project is to develop a device capable of identifying the different sleep stages (Wake, Non-REM, REM), and then predict which moment is the most suitable to wake up the user within a given boundary.

For example: the user wishes to wake up at around 8 am and it is okay if the system could wake him 15 minutes earlier or later. The user will then define a threshold of 15 minutes and the system will know that it can wake the user up at the most appropriate time between 7:45 and 8:15.

For this project we have planned to use the IMU integrated in the Arduino Nano BLE 33 Sense and an external Heart Rate sensor, obtaining a total of 4 different measurements (3 for the axes of accelerometer and 1 for the HR) in a time series format.

For data acquisition, we have chosen the following dataset: Motion and heart rate from a wrist-worn wearable and labeled sleep from polysomnography v1.0.0 which has been labelled with professional equipment and personnel.

Because of the way this dataset comes, it needs some pre-processing to be functional. The data it contains is the following:

  • Accelerometer data: 3 axes with 50-150 samples per second.

  • Heart rate sensor data: 1 sample per 15 seconds.

  • Labels: Stages (0-5, wake = 0, N1 = 1, N2 = 2, N3 = 3, REM = 5) N1, N2 and N3 are the 3 different Non-REM sleep stages. These have been taken every 30 seconds.

For further data processing, we have done the following: we take the peak value (for each axis) per second of the accelerometer and combine it with the heart rate sensor sample.

Because the resulting dataset is not too large, we use AutoFeat to add derived features (e.g. lineal combinations) so that the model has more features to learn from.

Note that we are not including references for this description but we can provide them if someone wishes.


Now that we have described the project by and large, here come the questions.

Dataset related questions

  1. Is the processing we have used a good idea or would it be better to use a “rawer” data? This question comes because after some pre-processing, our dataset turned out to be very compact and with not too much information for training.

  2. The dataset contains too much N2 labelled data. This makes the dataset to be biased causing the model to mainly predict this label. What options are there to solve it?

Here there is the distribution of labels (4 and -1 are non-valid labels, which do not count for the dataset).


Model related questions

  1. Would it be possible to process the data with CNN (e.g. converting the time series to spectrograms as in the course examples)?

  2. Otherwise, is it better to use a DNN Keras model applying (or not) time series?

  3. If none of the above, should we use a more traditional Machine Learning model? Could we make it tiny and convert it to .tflite?

Other questions

  1. After tinkering with different variations of our dataset and different types of neural network models, we found that the validation set (both loss and accuracy) was never going alongside the training set. Incidentally, this happened for many different combinations, no matter what. The picture illustrates this behaviour:

Is this due to overfitting? Or are we overlooking something vital?

Please feel free to ask any questions if you have and to answer any of our questions posted. We would be only too happy to hear from anyone in the community!

Thank you for sharing your project, it sounds very interesting and I look forward to seeing the final results.

I do not have any specific answers for the questions you posed, but I have a couple of general thoughts and ideas which may inspire some lines of exploration. These are mostly in regards to the data rather than the model.

My first thought is to ask if you have tried to decouple the accelerometer and heart rate data. Are you able to get a useful prediction from either of them individually? I am not clear which signal is carrying the most important features and it might be worth experimenting.

Second, in regards to the too much N2 data, for your device is there a practical need to use all of the stages? Could you get away with just having Awake, REM, non-REM as labels? If so, you could bucket all of the non-REM samples, shuffle them and then use equal parts Awake, REM, non-REM for your dataset. I realize this would make the dataset even smaller but it would be balanced and might provide predictions with enough resolution to hit the 30 minute window you mentioned above.

I hope these thoughts are in some way helpful. Best of luck on your project! :slight_smile:

1 Like

Hi @stephenbaileymagic, thank you very much for your reply. Your thoughts are really good and very helpful for us! :raised_hands:

To your first thought, we have not deliberately tried to decouple accelerometer and heart rate data at the moment. This is because after in-depth research, we have concluded that accelerometer data alone is not enough to deliver precise predictions in some situations, especially having false wake negatives (meaning that the user wakes up with no considerable wrist movements and the device misguidedly thinks the user still sleeps).

In case of interest, this article explains it very well. I attach a slide from the article that shows this difference:

Adding an extra sensor, we though it may provide additional insight for the whole system, although for some cases it could only add redundancy.

Even so I see your point and we can try to separate both types of data just to see if it adds meaningful value at least to our system.

To your second statement, yes we considered to reduce the categorical classes to only Awake, REM and non-REM. The problem is that non-REM alone has most of the data (N1, N2, and N3). This of course makes sense because usually any person stays most of their sleep full cycle in non-REM stages. This is followed by a short REM time and a very short Awake time. Just for visualization:

So, grouping all the non-REM stages together would make the model even more biased. We wanted to make the dataset kind of 50-50 but we do not find any other solution rather than “chopping off” non-REM data and making the dataset ridiculously small. Another idea would be to increment the other data with some short of artificial Data Augmentation (as it is usually done with CNN). But again, we do not know if there is a better way to balance it.

Thanks again, your time dedicated to reading our queries helps us a lot :wink:

1 Like

Thank you for linking to the article, I now have a better understanding of some of the challenges involved. In particular, it was interesting to see how the addition of the heart rate monitor influenced the accuracy of the REM/Non-REM/Wake classification.

If I understand correctly, what you really want the machine learning to do is to recognize whether or not any given moment is REM sleep (ring alarm) or not REM sleep (don’t ring alarm) so the other classes in the data are not helping.

I will think about this some more. It is certainly a challenge, but when you succeed in making this device, I would like to buy one! :slight_smile:

Hi, I’m glad the article helped you understand our challenges better. Sorry if it wasn’t clear enough at the beginning, I found it quite hard to introduce such a broad topic concisely.

You are right, in the end all we want is the device to ring the alarm based on what the ML model predicts, which is basically one of the two classes REM or non-REM (one being very limited in the amount of data).

Thanks again for your time. Anything that might occur to you, please hit us up!

PS: The good news is once deployed on the Arduino, my colleague and I will be the guinea pigs and test it ourselves. So, if one of us turns up late at work, then we’ll know why :wink:

1 Like

Hi @Chaplin5 ,

I still have you in mind, I hope your project is going well. I was reviewing the Applications of TinyML course for another project I am working on and it made me wonder: Have you tried an anomaly detection approach?

In the class they discuss the K-means and Autoencoder model designs. I am not familiar enough with these to recommend one, but it would seem to fit with your data constraints. You could potentially bucket all of the N1-N3 as “normal” and let REM be seen as anomaly. You could use the small amount of REM data you have to test the system after training to make sure it works.

I am not sure if the N1-N3 data is similar enough to put all together and have the REM stand out, but it might at least give a starting point. I hope this is helpful, or if you have already tried this and it didn’t work, I’d be interested to hear why it didn’t work.

Best regards,

Hi @stephenbaileymagic,

thanks for sharing your idea. Honestly, we didn’t quite think of taking an Anomaly Detection approach just because of simplicity. As our first attempt we wanted to use a simple ‘dirty’ model, see what we could get and from there, tweak it or try other models or approaches.

But in this case, and seeing our lack of data, it does make perfect sense. We’ve ended up having roughly a 70-30 distribution between both classes. So, I think your insight might actually be spot on! Of course, we need to bucket N1-N3 and see that these three together differ somehow to REM. But again, to this point it is worth trying.

Although before that we want to give one last try to our previous approach. The reason for that is because we can in fact get more data. Initially, in its rawest state, the dataset comes with millions of datapoints (as a result of having an IMU sensor recording non-stop during 7-8 hours, and that for the 31 users).

However, because of the nature of this application, where the user is not actively moving the wrist while sleeping, we considered we could compact the data and get the most significant movements within a window time of few seconds (this is in fact described at great length in some papers we researched). But just out of curiosity, we will try to input the dataset as a whole, only processing the bare minimum and see what happens. If the results are still poor, then we’ll undoubtedly follow your suggestion and go for AD.

Anyway, I’ll keep you up to date as soon as there is some progress :wink:

1 Like