Hi all! Attached is a graph of my latest attempt at transfer learning from a mobile_netV1 model already trained to 1 million steps, doing transfer learning with 50000 additional steps to my new dataset. The chart of training loss is very volatile and certainly never converges or improves past a loss of ~0.3 . My paramters are as follows:
Learning Rate=0.045
Label Smoothing=0.1
Learning Rate Decay Factor=0.98
Number of Epochs Per Decay=2.5
Moving Average Decay=0.9999
Does anyone have suggestions or ideas about what’s causing his behavior/why training is not converging?