CAP Theorem & its relevance to No SQL DB

santosh
3 min readApr 5, 2020

--

CAP theorem, also named Brewer’s theorem after computer scientist Eric Brewer, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees all the time.

- Consistency:
Every read receive the most recent write or an error. Every client must receive the most recent write or error no matter which node they connect to.
- Availability
Every request receive a non-error response without the guarantee that it contain most recent write
- Partition tolerance
Partition is communication break in the distributed system.
Partition tolerance means the cluster continue to work despite any number of communication break.
(The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes)

When a network partition failure happens we should decide to
- cancel the operation thus decrease availability but ensure consistency.
- proceed with operation thus compromise consistency but ensure availability.

CAP is often misunderstood that one has to abandon one of three guarantees all the time.When network is working fine all of the three can be achieved. Only when there is network failure or partition failure one has to choose between consistency and availability.

CAP theorem play a vital role in deciding which distributed system to choose.

Now let’s look at few nosql solutions and there score as per CAP.

MongoDb and the CAP theorem

As per CAP theorem mongoDb is CP.
MongoDb is a single-master system where each *replica set can have only one primary node that receives all the write operations. All other nodes in the same replica set are secondary nodes that replicate the primary node’s operation log and apply it to their own data set. By default, clients also read from the primary node, but they can also specify a read preference that allows them to read from secondary nodes.

MongoDb is strongly consistent in case you read from primary node and eventual consistent if you read from secondary nodes.
Also when primary nodes goes down by the the time new primary node is selected system is unavailable for write and read queries are supported if those are configured to run on secondaries.

*replica set
A replica set in MongoDb is a group of Mongodb processes that maintain the same data set. Replica sets provide redundancy and high availability,

Cassandra and the CAP theorem

Apache Cassandra is an open source NoSQL database maintained by the Apache Software Foundation.
It’s a wide-column database that lets you store data on a distributed network. However, unlike MongoDB, Cassandra has a masterless architecture and as a result, it has multiple points of failure, rather than a single one.

Relative to the CAP theorem, Cassandra is an AP database, it delivers availability and partition tolerance but can’t deliver consistency all the time.
In Cassandra, any coordinator nodes can accept read or write requests and forwards requests to respective replicas based on the partition key. Hence even if a replica/node goes down, others can serve the read/write requests. So, is it safe to say Cassandra is always available?

In Cassandra, we can define the read/write consistency level in the Cassandra client while creating the Cassandra Session.

Impact of consistency level have on Read/Write.

If the consistency level is THREE, Cassandra will read from the three replicas and return the latest data among the 3 nodes and update the other outdated replicas too.
So, Just by setting consistency level to QUORUM consistency. As these consistency level settings are applied to both reads and writes. We can achieve 100% consistency.

So, what happens to Availability?
Larger the consistency level, the availability of the system will decrease.

Write consistency level

Read Consistency level

--

--