Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if "normal" RDIMM ECC would be enough to mitigate most of those radiation bit-flipping issues. If so it wouldn't really make a difference to earth-based servers since most enterprise servers use RDIMM ECC too


You'll get bitflips elsewhere besides just in RAM. A bitflip in L1 or L3 cache will be propagated to your DIMM and noone will be the wiser.


I thought server CPUs already handled this? E.g. for Epyc https://moorinsightsstrategy.com/wp-content/uploads/2017/05/...

> Because caches hold the most recent and most relevant data to the current processing, it is critical that this data be accurate. To enable this, AMD has designed EPYC with multiple tiers of cache protection. The level 1 data cache includes SEC-DED ECC, which can detect two-bit errors and correct single-bit errors. Through parity and retry, L1 data cache tag errors and L1 instruction cache errors are automatically corrected. The L2 and L3 caches are extended even further with the ability to correct double errors and detect triple errors.


Sun Microsystems famously had this problem with their servers using the UltraSPARC II chips, with cache SRAM that didn’t have ECC. Later versions of their processors had ECC added.


Those do ECC already


What about the registers?


What about the ALU/FPU/TPU itself?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: