Wednesday, April 16, 2014

HBase write operation

When you write or update a record in HBase table, the interanl process is the same. First of all, HBase receives the command and persists the change, if the write fails it will throw an exception. A successful write operation goes into two places:

  1. WAL (write-ahead log) or HLog.
  2. MemStore.

Thr reason why HBase records the write operation in two places is in order to maintain data durability. Only after the change is written to and confirmed in both places, the write is considered complete.

MemStore is a write buffer where HBase accumulates data in memory before a permanent write. When MemStore fills up, its contents are flushed to disk to form an HFile. Every flush generates a new HFile. HFile is the underlying storage format for HBase. One thing to notice that a HFile belongs to a column family, and a column family can have multiple HFiles. A HFile can't have data across multiple column families. The following picture shows you the write process.


How HBase handles write failures?
Hadoop is designed for large distributed systems, so is HBase. In a distributed system env, failures are common. Imagine that one of the HBase server that is hosting a MemStore that has not yet been flushed crashes, what will happen is that you will lost the data in MemStore, but you can recover the data from WAL. Every HBase server keeps a WAL to record changes (A single WAL per HBase server), and this WAL is a file on the underlying system. A write isn't considered successful until the new WAL entry is successfully written. You don't have to do the data recover manually, it's all handled by HBase as part of recovery process.

Should you skip WAL?
Skip WAL will definitly improve the write performace, but it comes at the cost of losing data during a MemStore crash. If you disable WAL, HBase can't recover your data in the face of failure. Any writes in MemStore that haven't flushed to disk will be lost.

No comments: