Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Supercomputers tend to need very high bisectional bandwidth.

With the Clos network style topologies that are commonplace in large data centers today, I'm not sure one couldn't achieve decent results in the public cloud.

AWS networking is pretty terrible, but in GCP, I can get 2gbps per core up to 16Gbps for an 8-core instance. For any bare metal deployment, I'm going to be maxed out around 100Gbps which will be close to saturating an x16 PCIe bus.

It's hard to find a dual-cpu frequency optimized processor with less than 8 cores and I'm not sure that'd be cost effective. With hyperthreading, that yields 32 usable cores or around 3.125gbps per core.

Even still, I wager they'd go for better density.

Also, I can get 8 GPUs along with that 8 core/16gbps instance in GCP. Sounds totally doable to me.



My back of the napkin calculation says that I can get 270 petaflops with 2160 n1-highmem-16 each with 8 v100 gpu on preemptible instances costing roughly $13k/hr or about $10m/mo


So with $120m/y in less than 2 years you'd exceed the price of the whole thing and also likely get worse interconnect speed and possibly raw computational speed.

AWS looks like a bad deal here.


If you have a constant load, on premise is always cheaper. Scientific computation has, for a large enough organization, a 100%

If you have a variable load, cloud infrastructure may make sense if you can easily auto-scale.

In my experience, most business real world applications are multi tiered applications with variable loads hence are a good fit for cloud infrastructure.

However, attaining the required application flexibility and KPIs for efficient auto scaling is quite hard and require strong functional & technical expertise.


My experience totally reflects this. Most enterprise IT infrastructure is idle the majority of the time.

I'm running infrastructure for a SaaS app in k8s. I feel like I'm doing well sustaining >50% efficiency, i.e. all cores running >50% all the time and more than half the memory consumed for things that aren't page cache. Hard to get better efficiency without creating hot spots.


That's GCP on preemptible nodes and you are correct.

Not a great deal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: