Why NoSQL?

Imagine you are the technical head of Facebook. And you are managing all the data in relational database mode. For the sake of simplicity, let’s consider that you just have to manage the friendship schema. Consider the following example for instance.

We have two tables in here. One on the left is User table and one on the right is the friendship table. The friendship table simply keep track of who is friend of whom. Now here are some facts:

1. With the ever increasing friendship data, the size of this database with such a small schema can grow enormous. Well, it can grow up to numbers of Terabytes. And if you are planning to keep this on a single disk and assuming that everything would work fine, it would be a foolish assumption. Basically, the rate at which the database would hit by millions of queries per second by millions of users worldwide would exceed 100 times with that of the reading capability of a hard drive. You therefore, instead of keep growing size of your disk, would need to divide the database into chunks and store it into clusters of disks.
Unfortunately, with relational databases, that is not possible. This is just like saving half of the data as per in the above image to one disk drive and other half in to another drive. Well you can imagine the problem. It would become pain to keep relationships intact and make them work properly. If you can’t do that with this simple schema, imagine doing that with complex schema with thousands of table sharing hundreds of relationships and getting queried millions of times in a second.

2. As the database grows in size, the queries’ response time increases. For instance, in the image above, assume that the data has grown to enormous size as more girls become friend of Neelesh. Now, if I want to find friends of my friends (well obviously to send more friend requests), I would write a simple algorithm.

STEP 1. GET ALL FRIENDS (female) OF NEELESH
STEP 2. ITERATE THROUGH THE LIST GOT IN STEP 1
STEP 3. GET UNION OF ALL OF THE FRIENDS (female) OBTAINED VIA ITERATING IN THE ABOVE LIST
STEP 4. RETURN RESULT

Rather simple algorithm isn’t it? The execution time of the above query would approximately be 0.016sec.

Now let’s try something more interesting. Let’s get the list of Friends of Friends of My Friends, i.e. by going deeper to third level.

STEP 1. GET ALL FRIENDS OF NEELESH
STEP 2. ITERATE THROUGH THE LIST GOT IN STEP 1 ITEM BY ITEM
STEP 3. TAKE ONE ITEM FROM THE LIST IN STEP 2 AND GET LIST OF FRIENDS FROM IT.
STEP 4. ITERATE IN THE THE LIST OBTAINED IN STEP 3, GET FRIEND OF EVERY ITEM AND GET UNION OF THEM.
STEP 5. REPEAT STEP 3 UNTILL THERE ARE ITEMS IN THE LIST.
STEP 6. GET UNION OF ALL OF THE FRIENDS OBTAINED VIA ITERATING IN THE ABOVE LISTS
STEP 7. RETURN RESULT

The execution time of above algorithm is near about 30 seconds. If you go further deep into the fourth level the execution time would grow to approximately 1534 seconds. On getting deeper to fifth level, well the execution time would be infinite and the query wouldn't return anything.
Now you can imagine how hard it is to query data in the relational databases if the size of database is huge.

Mainly because of the above two reasons (and others), relational databases loses their usefulness when huge datasets comes into the picture, with millions of information and thousands of relationships are there to work with.

Therefore came NoSQL databases. We will learn about them in next post.