Supercomputers tend to need very high bisectional bandwidth. With the Clos netwo...

halbritt · on Oct 30, 2018

My back of the napkin calculation says that I can get 270 petaflops with 2160 n1-highmem-16 each with 8 v100 gpu on preemptible instances costing roughly $13k/hr or about $10m/mo

rdtsc · on Oct 30, 2018

So with $120m/y in less than 2 years you'd exceed the price of the whole thing and also likely get worse interconnect speed and possibly raw computational speed.

AWS looks like a bad deal here.

karambahh · on Oct 30, 2018

If you have a constant load, on premise is always cheaper. Scientific computation has, for a large enough organization, a 100%

If you have a variable load, cloud infrastructure may make sense if you can easily auto-scale.

In my experience, most business real world applications are multi tiered applications with variable loads hence are a good fit for cloud infrastructure.

However, attaining the required application flexibility and KPIs for efficient auto scaling is quite hard and require strong functional & technical expertise.

halbritt · on Oct 31, 2018

My experience totally reflects this. Most enterprise IT infrastructure is idle the majority of the time.

I'm running infrastructure for a SaaS app in k8s. I feel like I'm doing well sustaining >50% efficiency, i.e. all cores running >50% all the time and more than half the memory consumed for things that aren't page cache. Hard to get better efficiency without creating hot spots.

halbritt · on Oct 31, 2018

That's GCP on preemptible nodes and you are correct.

Not a great deal.