But does model get quality hit: need to train for more steps before converging to the similar performance and have more parameters?
FM16 obviously contains less information than FP32.
But does model get quality hit: need to train for more steps before converging to the similar performance and have more parameters?
FM16 obviously contains less information than FP32.