So the good news is that individual neuron activations often stay within a relatively narrow range. I think empirical evaluation is really needed to be able to tell how robustly this approach works. I think that that is certainly the greatest source of noise during training (and the first thing to break if you choose unstable hyperparameters). Great comment.