HBase, Cassandra, MongoDB, Riak, Couchbase And redis were the main candidates. H...

linuxhansl · on July 8, 2017

As author of the cited post on HBase lemme clarify:

0. Here are more details on that: http://hadoop-hbase.blogspot.de/2013/07/protected-hbase-agai...

1. By default HBase "flushes" the WAL. Flush here means to make sure that at least 3 machines have change in memory (NOT on disk). A datacenter power outage can lose data.

2. As HDFS closes a block it is not by default forced to disk. So as HBase rewrites old data during compactions by default, old data can be lost during a power outage. Again, by default.

3. HDFS should be configured with sync-on-close, so that old data is forced to disk upon compactions (and sync-behind-writes for performance)

4. HBase now has an option to force and a WAL edit (and all previous edits) to disk (that's what I added in said jira).

5. This is post is 4 years old for chrissake :)... Don't base decisions on 4 year old information.

HBase _is_ a database and it will keep your data safe. Unfortunately it requires some configuration and some knowledge.