Vladimir Sedach

Vladimir Sedach

Have Emacs - Will Hack

March 5, 2009

Facebook Cassandra database overview part 1: background

One of the most natural and popular tendencies in the application of distributed systems theory is the construction of distributed databases with the hopes of providing improved availability and scalability (in the number of requests processed per second) via data replication. While distributed queries and transactions are obviously needed for some kinds of databases, what underlies every database is the ability to read and write a sequence of bytes to some place denoted by another sequence of bytes. In database jargon this is commonly called a storage engine, while in a distributed system this can be seen as a type of distributed shared memory.

Any kind of database providing access to more than one client at a time must deal with concurrency issues and hence provide some kind of consistency model for data access and manipulation. This can make constructing distributed databases either easier or harder, as the underlying consistency model of a chosen distributed implementation can be exposed directly as that of the database, or several distributed mechanisms have to be combined in complicated ways to provide the properties mandated by the stated consistency model of the database.

Choosing a given consistency model involves large trade-offs in the performance, simplicity and ease of maintenance of the distributed database implementation. Ideally, all the nodes of the system would play the same role, and adding new nodes would incrementally improve the availability, capacity, and number of requests the system can handle. Because of the communication and synchronization overhead necessary for implementing stricter consistency models, the improvement in those aspects as a function of nodes in the system displays the quality of diminishing returns (and may in some cases provide negative ones).

Fortunately, many applications do not require strict consistency models of their database. The first to go is usually the requirement for multi-operation transactions with the ACID property. This provides the greatest improvement in potential scalability as both the locking and multi-version timestamping mechanisms for implementing transactions degrade in performance with increased contention between nodes.

The fad du jour in distributed databases is the distributed, non-transactional, non-relational database. It seems there is a new project implementing a distributed key-value, document-oriented, or "loose schema" database announced every couple of months.

Despite the hype, only a handful of these projects have large deployments. Even rarer is coming across someone who has worked on more than one such database. These facts alone make Facebook's Cassandra stand out. Cassandra was developed and is currently deployed at Facebook to handle messaging. One of the developers of Cassandra, Avinash Lakshman, has previously worked on Amazon's internal Dynamo distributed database.

The Cassandra source code has been released under the Apache 2.0 license. This is the first in a trilogy of blog posts examining Cassandra's use of modern distributed systems techniques and my observations from sitting down and reading the code.

Jonathan Ellis later commented on my writeup: http://spyced.blogspot.com/2009/05/better-analysis-of-cassandra-than-most.html