What do you think about TF32 in 3090? Could it replace FP32 with 5x speedup?

ml_hardware · on Sept 29, 2020

I've done a lot of work in ML numerics, and I think TF32 is a completely safe drop-in for FP32 for ML workloads. NVIDIA seems to think so too, which is why on A100 it won't even be an option, it will be the default mode for any FP32 matrix multiplies.

But on 3090, I don't think the speedup will be 5x, it should be closer to like 2x. The 3090 has 35.6 TF/s at TF32 and the Titan RTX has 16.3 TF/s at FP32. Once again I think there is handicapping going on for 3090.

bitL · on Sept 29, 2020

So basically no difference to FP32. That sounds very handicapped.