Not being in industry, I forget that AWS doesn't have 100% utilization 100% of the time. One of the reasons the lab likes us lattice QCD people is that we always have more computing to do, and are happy to wait in the queue if we get to run at all. So we really help keep the utilization very high. If the machine ever sits empty, that's a waste of money.
You're right that the IO tends to be very high performance and high throughput, too.
Yup totally understood. We have a much smaller (but still massive) supercomputer for $REAL_JOB where there is always a queue of work to do with embarrassingly parallel jobs or ranks and ranks of MPI work to do. When we add more resources, the users simply can run their work faster, but it never really stops no matter how much hardware we add.
As much as people love to hate them, I'd love to see you get IO profiles remotely similar to what you can get with Lustre or Spectrum Scale (gpfs). They're simply in an entirely different ballpark compared to anything in any public cloud.
We're lucky in the sense that the IO for LQCD is small (compared to other scientific applications), in that we're usually only reading or writing gigabytes to terbytes. But also our code uses parallel HDF5 and it's someone else's job to make sure that works well :)
You're right that the IO tends to be very high performance and high throughput, too.