Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm bothered by the need to run SW RAID on top of HW RAID. One would think that Amazon would sell faster EBS "disks" for a premium.

And slower disks for a discount? But, I guess that's what S3 is for.



I think it works like this:

Even if Amazon uses 'k' hdds for one EBS "disk", since you're sharing the real hdds with other users, you don't get 'k' hdds' performance, you only get a fraction.

By RAIDing over 'n' EBS "disks", you are effectively compensating for the reduced performance due to sharing.


I get what the stack looks like, but it seems really broken and likely quite inefficient. Thus far, Amazon has gone after greenfield applications which can be written within the constraints of their cloud platform. However, there are a ton of people hosting their own SQL database-based apps where a single DB is the bottleneck. Without significant refactoring, these apps can only scale vertically with the DB. So, while Amazon provides nice, big boxes to run SQLServer/MySQL/etc., disk performance is that of a desktop machine -- hardly a balanced system. How many more customers could they capture if they offered premium, high-performance storage options?


You hit the nail on the head as to why I'm looking into a physical DB server with RAIDed SSD's instead of hopping onto EC2. I would love to use Amazon and not have to deal with the potential headaches of managing physical machines, but the stories (maybe FUD) of having to raid EBS instances, spool up 20 instances to find the winners and kill the rest, etc etc really kills the appeal.

If they could promise me consistent database performance on par with a really nice physical machine, I would gladly fork over 500/month for it.


As someone who has spent the last year and a half running a 200-(persistent)node environment on EC2, including multiple m1.large and m1.xlarge DB pools, I can assure you those stories stem from FUD and unreasonable expectations.

Yes, EBS is not very fast, especially compared to local disk. You can work around this, however, by configuring multiple volumes in a RAID configuration (as you have mentioned), or by scaling out with additional nodes. The size and workload of your database will dictate which is more cost-effective.

Spooling up many nodes to find the "best" one is completely unnecessary. In my experience, EC2 nodes have been remarkably consistent in performance. I won't say I run a load test on every one, but I will say that over 100,000 node launches, I've never had to shut down a poorly-performing instance that couldn't be attributed to a hardware issue (rare, and for which Amazon sends notifications).

Don't listen to the naysayers. Come on in, the water's fine!


Thanks very much for the FUD-debunk, it's always great to get advice from someone who has thoroughly kicked the tires of something. I may start considering it once again.

Would you mind sharing what kind of small-block IO/sec numbers you've seen from the EBS's? My app tends to generate lots of IO with not a huge amount of cacheability, and it has a relatively small dataset, which is why I'm considering SSDs in the first place.


An EBS volume has the performance of a ~10-disk RAID; it's hardly desktop class. It would be nice if they offered wide-striped volumes, but Amazon's strategy is to not do anything that customers can kludge for themselves.


This guy's blog posting suggests otherwise:

"Remember, the speed and efficiency of the single EBS device is roughly comparable to a modern SATA or SCSI drive."

http://af-design.com/blog/2009/02/27/amazon-ec2-disk-perform...

Perhaps EBS has improved drastically over the past year?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: