What exactly are these intermediate values composed of that makes them notable in size compared to the model itself? Are there resources I should read for how a model like this executes?
Will this model work with half precision weights? Is it very awkward to use "brain" 16 bit floats?
Will this model work with half precision weights? Is it very awkward to use "brain" 16 bit floats?