Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it's mostly simplicity. Say your ECC code is only fast with say 17 shards/blocks of data and 3 shards/blocks of parity. I hear numbers like that used for hadoop, backblaze, microsoft and facebook object storage, etc.

So sure you can split large files into groups of 20 pieces (17 data + 3 parity). But the real world often has low error rates, but the errors that do happen can happen in bursts. Say adjacent nodes dying, power strips/circuits blowing, complete loss of network connectivity for 30 seconds, etc. BTW, yes I'm aware you can tell hadoop to keep 2 copies in rack and 1 copy outside the rack, that's basically crude interleaving.

If your code can handle a larger number of blocks/shards you can average your parity over a larger number shards/blocks so you can plan for the average error rate, but still handle substantial bursts of error.

So maybe instead of 17+3 (17% overhead, survives 3 lost) you switch to 2048+192 (9.4% parity). That way you can survive a 9.4% error rate, even if that's 4 errors in a row. Granted updates become much more expensive, but for some cases it would be well justified by being more durable.



if they will need that, they will find a method long ago. but they try to minimize amount of data read/written for every transaction. look up for example "pyramid codes" - it's a way to use 20 disks in raid system but read/write only 5-6 for any operation. and MS use that in azure afaik

with 2000 data blocks you will need to read back all those datya blocks to generate new parity when any of these data blocks was changed. and this decreases overall perfromance 2000 times. so you should see why they need those pyrammid codes rather than something on opposite side




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: