Intuitively one could reason that the
learnt weights for the deeper layerscould become more specific
to the images of the training datasetand the task it is
trained for. We observed the
same trend in the individual classplots. The subtle drops in
the mid layers (e.g. 4, 8, etc.) is dueto the “ReLU” layer
which half-rectifies the signals.Although this will help the
non-linearity of the trained model inthe CNN, it does not
help if immediately used forclassification.