ResNet: 100% accuracy during training, but 33% prediction accuracy with the same data
It’s because of the batch normalization layers. In training phase, the batch is normalized w.r.t. its mean and variance. However, in testing phase, the batch is normalized w.r.t. the moving average of previously observed mean and variance. Now this is a problem when the number of observed batches is small (e.g., 5 in your example) …