I have to say a big [citation needed] to the claim of ARM beating Intel's high end chips on performance per watt, at least on general workloads.
I think it's common to extrapolate Atom vs ARM to Xeon vs ARM in HPC, without thinking through the implications. We may well get higher performance/watt for single threads under ARM - I'm not disputing that, especially for integer work.
However, Amahdl's law is going to raise its head. In the same machine, a higher number of lower performance threads is going to cause lock contention. You'll also have to split computations over more boxes, since the absolute performance of an Intel server will remain far higher (by 2014, we're talking 64 core/128 thread Haswell). Both of which are likely to be a massive tax on performance.
To fight this, performance per core is likely to see a substantial rise, both in clock frequencies, and as a result of single core complexity. However, this will directly work against the two things that makes ARM performance/watt so impressive currently.
Also, Intel's entire company is built around making those 100 watt scale processors fast and well. They really stumbled entering the Atom market; both because of a weak design (the chipset drew more power than the CPU itself!), as well as a lack of commitment (using 2-4 year old process nodes).
I think we're likely to see a similar teething pains with companies trying to enter the server market for ARM. The instituational knowledge just won't be there. Make a cache architecture that effectively feeds 64 cores? Way different to improving power drain on a mobile CPU, for the seventh generation. I expect it will be at least a few generations before design teams are fully up to speed.
Remember that AMD is reasonably well funded, also focused around server CPUs, and often stumbles. AMD is competent; Intel just makes them look incompetent by comparison.
I'm not saying we won't see certain workloads that are better off under ARM; memcached and static http serving are both likely to do well, since they're effectively just shuffling bits around, aren't particularly CPU intensive, and are embarassingly parallel. But I believe they'll turn out to be the exception, not the rule.
Which is to say, there's nothing magic about ARM that will let them beat x86 at the high end. They'll have to fight for it, and against Intel on their own turf no less.
"There's nothing magic about ARM" is right. Atom, for example, seems competitive as a low-power architecture now that Intel's really trying to get into the mobile market. The open question is whether cheap power-focused cores, from any maker, can compete against big server chips, and I think they do have a niche.
Low-power server makers probably admit, in their hearts, that the workload their stuff works best with is specialized. CPU-munching apps in scripting languages, or very CPU-intensive data work (full-text searches, say), are not what they're good for. Static content, memcache, and boxes that basically broker between other nodes and do very little 'thinking' themselves are candidates. As some Googlers pointed out, any work where the CPU causes much of the user-visible latency is right out.
I'd add that good low-power-CPU servers don't just look like regular servers with a low-power CPU slotted in. You need low-power storage, i.e., Flash. You probably want lots of cores to amortize the energy cost of memory, etc., so that means it works best with a future uarch like Cortex-A15 that supports that. You want low-power memory. Then servers are probably easily sub-1U, so you get a blade-like physical layout, with some resources shared among nodes.
Caldexa probably would rather not hear this, but you might have to cut price, not just improve power use and density, to secure ARM a niche against big, fast Intel chips in the DC. I think that can be done over time, because the premium Intel charges on top of chip maufacturing cost is a lot and ARM IP is relatively cheap. But it may make it hard for Caldexa to make back their initial R&D costs unless 1) some early-adopter customers pay a big premium (possible--Facebook, you into this?), 2) the market eats it up with surprising speed and soon everyone's got some ARM nodes in their racks -- seems unlikely, or 3) investors put in enough to outlast a long slow growth period (and I could see an ARM manufacturer like Qualcomm or Nvidia doing that with their own ARM architectures, but that may not help Caldexa or its investors).
Any comparison's a stretch, but consider that ARM consumer devices aren't only more mobile than high-end computers, they're cheaper too.
I look forward to eating these words in a few years.
I think it's common to extrapolate Atom vs ARM to Xeon vs ARM in HPC, without thinking through the implications. We may well get higher performance/watt for single threads under ARM - I'm not disputing that, especially for integer work.
However, Amahdl's law is going to raise its head. In the same machine, a higher number of lower performance threads is going to cause lock contention. You'll also have to split computations over more boxes, since the absolute performance of an Intel server will remain far higher (by 2014, we're talking 64 core/128 thread Haswell). Both of which are likely to be a massive tax on performance.
To fight this, performance per core is likely to see a substantial rise, both in clock frequencies, and as a result of single core complexity. However, this will directly work against the two things that makes ARM performance/watt so impressive currently.
Also, Intel's entire company is built around making those 100 watt scale processors fast and well. They really stumbled entering the Atom market; both because of a weak design (the chipset drew more power than the CPU itself!), as well as a lack of commitment (using 2-4 year old process nodes).
I think we're likely to see a similar teething pains with companies trying to enter the server market for ARM. The instituational knowledge just won't be there. Make a cache architecture that effectively feeds 64 cores? Way different to improving power drain on a mobile CPU, for the seventh generation. I expect it will be at least a few generations before design teams are fully up to speed.
Remember that AMD is reasonably well funded, also focused around server CPUs, and often stumbles. AMD is competent; Intel just makes them look incompetent by comparison.
I'm not saying we won't see certain workloads that are better off under ARM; memcached and static http serving are both likely to do well, since they're effectively just shuffling bits around, aren't particularly CPU intensive, and are embarassingly parallel. But I believe they'll turn out to be the exception, not the rule.
Which is to say, there's nothing magic about ARM that will let them beat x86 at the high end. They'll have to fight for it, and against Intel on their own turf no less.