Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

HBase, Cassandra, MongoDB, Riak, Couchbase And redis were the main candidates.

Hbase: http://mail-archives.apache.org/mod_mbox/hbase-issues/201307...

Cassandra: (fsync to WAL, not full fsync). https://wiki.apache.org/cassandra/Durability

MongoDB: ... too much wrong here to list, although I hear it's improving in being cluster aware etc.

Redis does support fsync as far as I remember but the write/delete pattern is incredibly sub-optimal, it runs basically out of a WAL by itself and runs very poorly if your dataset does not fit in memory.



As author of the cited post on HBase lemme clarify:

0. Here are more details on that: http://hadoop-hbase.blogspot.de/2013/07/protected-hbase-agai...

1. By default HBase "flushes" the WAL. Flush here means to make sure that at least 3 machines have change in memory (NOT on disk). A datacenter power outage can lose data.

2. As HDFS closes a block it is not by default forced to disk. So as HBase rewrites old data during compactions by default, old data can be lost during a power outage. Again, by default.

3. HDFS should be configured with sync-on-close, so that old data is forced to disk upon compactions (and sync-behind-writes for performance)

4. HBase now has an option to force and a WAL edit (and all previous edits) to disk (that's what I added in said jira).

5. This is post is 4 years old for chrissake :)... Don't base decisions on 4 year old information.

HBase _is_ a database and it will keep your data safe. Unfortunately it requires some configuration and some knowledge.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: