Here is a picture of the gradient of a conv2d layer (the kernel). It has a zigsag pattern which I would like to understand. What I understand is that the gradient changes from mini-batch to mini-batch. But why does it increase after each epoch?
I am using the Keras Adam optimizer with default settings. I dont think that is the reason. Dropout and Batch-Norm. should also not be the reason. I am using image augmentation but that does not change …