1. By default HBase "flushes" the WAL. Flush here means to make sure that at least 3 machines have change in memory (NOT on disk). A datacenter power outage can lose data.
2. As HDFS closes a block it is not by default forced to disk. So as HBase rewrites old data during compactions by default, old data can be lost during a power outage. Again, by default.
3. HDFS should be configured with sync-on-close, so that old data is forced to disk upon compactions (and sync-behind-writes for performance)
4. HBase now has an option to force and a WAL edit (and all previous edits) to disk (that's what I added in said jira).
5. This is post is 4 years old for chrissake :)... Don't base decisions on 4 year old information.
HBase _is_ a database and it will keep your data safe. Unfortunately it requires some configuration and some knowledge.
0. Here are more details on that: http://hadoop-hbase.blogspot.de/2013/07/protected-hbase-agai...
1. By default HBase "flushes" the WAL. Flush here means to make sure that at least 3 machines have change in memory (NOT on disk). A datacenter power outage can lose data.
2. As HDFS closes a block it is not by default forced to disk. So as HBase rewrites old data during compactions by default, old data can be lost during a power outage. Again, by default.
3. HDFS should be configured with sync-on-close, so that old data is forced to disk upon compactions (and sync-behind-writes for performance)
4. HBase now has an option to force and a WAL edit (and all previous edits) to disk (that's what I added in said jira).
5. This is post is 4 years old for chrissake :)... Don't base decisions on 4 year old information.
HBase _is_ a database and it will keep your data safe. Unfortunately it requires some configuration and some knowledge.