Hello,
I have two types of networks.
Type 1: 2-layer 1DConv + Dense
Type 2: 3-layer 1DConv + Dense
I perform hyperparameter tuning (wandb) and found an optimal network (best parameters) for both types. I obtain the same MSE for the unseen test data. During my experiments, I also keep track of the memory usage of the model. I “translate” the model to a quantized tflite model (post-training quantization).
What surprises me is that a (“larger”) 3-layer 1DConv + Dense has a lower need for memory compared to a 2-layer 1DConv + Dense (given the same MSE, i.e. regression model). In case you tune the parameters only of the Dense layer, it has a large impact on memory. Probably because the Dense Layer is fully connected and has more weights to store. Of course, a comparison between the two models is still difficult because you can tune a lot of parameters…
What I am currently looking for is a better theoretical and fundamental understanding of the memory impact for each type of layer (1DConv, 2DConv, … and Dense layer) inside a network.
Regards,
Joeri