Started working on HBase again!! Thought why not refresh few concepts before proceeding to actual work. Important things comes into mind when we work with NoSQL is distributed environment are sharding and partitions. Let’s dive into ACID properties of database and CAP theorem for distributed system.
The ACID properties and the CAP theorem are two concepts in data management to distributed system. Funny thing they both comes with “C” with totally different meaning.
What is ACID: – It is a rule and meant a lot for RDMBS because all RDBMS are ACID compliance.
A=Atomicity, means all or nothing, if I own a session and fire two updates to my transaction that means they all succeed , it never be the case when one of my update success and another failed.
C=Consistency:- All transaction to database will bring database from one valid database state to another and once data pass all defined rules like constraints, triggers or any other rule, it become permanent and acquire another database state. But if programmer want any application specific rule it does not guaranty, simple because database don’t know your business case either define it on database level or take care at application level, otherwise it would not be consistent according to your business rule.
I=Isolation: – RDBMS are full with latches and locking, and locking and latching. All to achieve concurrency and there are various other ways to achieve concurrency is serialization isolation level, strict and read only. We get phantom reads, dirty reads and many other if isolation levels are not followed.
D= Durable: – It means when transaction is committed it will remain in this valid state till next valid state occurs. In case of any loss to power or crash can be recovered by solution provided by respective database.
What is CAP: – The CAP theorem stats that a distributed (or shard data) system can offer two out of three desired properties – Consistency, Availability and Tolerance to Partitions.
C= Consistency have different mean here, it is basically if someone writes a value to a database, thereafter other users will immediately be able to read the same value back. If Consistency is violated, then you’ll get the case where one user will get a different answer from another: if user A makes a request and it gets routed to node X, and user B makes a request and it gets routed to node Y, and X and Y have different data, A and B will get different answers.
A=Availability means that if some number of nodes fail in your cluster the distributed system can remain operational. If Availability is violated, it means that any node might go down at any time, and that a request made to that node might just never return a response. Alternatively, if a sibling node had just been updated, it would mean that a node would refuse to respond to requests until itself was updated to reflect the most recent state.
P=Tolerance to partitions, means that if the nodes in your cluster are divided into two groups that can no longer communicate by a network failure, again the system remains operational. If Partition tolerance is violated, it means that a single node going down could take even more nodes down with it, and that a failure in one part of the system could spread.
Remember that the CAP theorem only applies in cases when there’s a broken connection between partitions in your cluster otherwise it is to misguide you. The more reliable your network, the lower the probability you’ll need to think about CAP.
This is to say that HBase is highly available that stresses on consistency and partition tolerance. HBase, after all, runs behind Facebook. However, in the world of NoSQL, HBase is considered as emphasizing on consistency and partitioning over availability.