Saturday, November 15, 2014

When to go from SQL to NoSQL databases?

While SQL has become lingua franca of relational database systems, SQL <> RDBMS. SQL is overwhelmingly  query language on RDBMS, terms, SQL database and RDBMS are used interchangeably.  I developed SQL features in RDBMS at Sybase, Informix & IBM. So, I feel good about it.

On the other hand, the rebels use NoSQL as an antonym to SQL.  While the popular expansion of NoSQL was Not-Only-SQL, it usually means the database does not support SQL... at least, it won't support good bit of SQL operations like joins, aggregations, etc.

Let’s assume, you have an application using SQL database.  Your boss returns from another big data conference — where they’ve invariably pitched the power of NoSQL — and asks you: “Josh,  why aren’t you moving to NoSQL?”.

While — volume, velocity, variety — give quick reasons to get to NoSQL, let’s look at the reasons and scenarios in bit more detail.

Let’s look at each criteria:

1. Critical Stable Application:   You’re running a mission critical application developed in-house that hasn’t changed in years. The database is reliable. Backup scripts work fine.  Report queries work reliably.
Backup & restore scripts are verified. Hot stand by is always ready. Your hardware can store & handle the data for next two years. And all the users you expect.

So, there is really no obvious reason to consider the alternative approach. 

2. A Very Dynamic Application: You’re attending meetings after meetings schema re-design?  You’re planning additional columns to the table in case it’s needed later on.  You add, unnamed_int, unnamed_varchar32, etc so when the request comes to add a new type, you can instantly provide that just by renaming the column.  You’re a hero!  However, just to decide the need for a particular column, there are numerous meetings.

Application is adjusting to market requirements.  Hence you need to change the data stored in tables — new data, new forms of data.  Any time you change the application, you need downtime.  And the changes never end. 

This is definitely a major symptom to consider NoSQL databases which usually have flexible schema. You only create the table and don’t specify the columns and types in NoSQL.  

In fact,in MongoDB, you don’t need to create the database or table ahead of time.
Simply start USEing a database, start to SAVE data into a table.  Database and tables are automatically created for you.  No column names to decide, no type lengths to think about.

3. Dynamic data sources: Many applications like single view of customer need to integrate new data sources: e.g. twitter feed of the customer or sales from the company you just acquired.  Changing data sources means changing the schema. Changing schema does not only mean you’re adding the columns — It also means, there could be conflicts that cannot be resolved. 

One data source has customer-id defined as integer and another as alphanumeric. That would force you to keep all the data in one superset data type — in this case, alphanumeric.  It would also mean, any existing data should be migrated.  Then, you need to change the SQL on this schema. Testing everything.

4. Scale-up and Scale-out:  I was watching shark-tank yesterday.  One of the contestants said, "year to date sales is 2 million dollars and we expect to close the year with 7 million dollars". Even the experienced sharks were surprised.  The reason for this was 75% of their sales come during christmas season.

Each business has variability due to seasons and events.  If you don't handle increased demand seamlessly, you'll risk losing revenue at the point of greatest profitability.

RDBMS can scale.  They've scaled from single processor systems to tens or even hundreds of processor-cores in a single machine, known as SMP -- symmetric multi-processor machines.    This is scale-up.  Every time you scale-up, you need planning, downtime, data migration and power-on procedures.  This is too costly, slow and risky.  

NoSQL systems were designed to scale-out.  When you reach system capacity, you simply add new systems into the cluster.  Data and workload gets distributed to the new configuration seamlessly.  No application downtime!   This is highly suited for modern commodity clusters and instant provisioning on the cloud.

5. TCO -Total cost of Ownership & Operation: Roughly, RDBMS vendors charge PER CORE.  Ivy Bridge from Intel has  12-core and 15-core model. NoSQL vendors charge PER NODE (machine within certain configuration limits).  The list price per node (defined as machine with up to 24-cores)  is typically the price of 2-cores.  So, 12 times difference right there!

So, it should be an easy call, right?  Unfortunately, not.  There is a huge feature difference between SQL and NoSQL systems. 

RDBMS does more than simple get and put of the row. Once you design a good schema, the number of SQL you can issue on this schema is infinite.  Data architects understand the data relationships and general application requirements and design the schema. Applications have flexibility on the queries they issue on this.  

In NoSQL, you're often encouraged to use the primary-key.  Often (e.g. HBase), primary-key is the only way to access the data efficiently. 

While RDBMS provides rich functionality, using the 80/20 hypothesis, majority of applications are using under 20% of the features.  Even the 20% of the RDBMS features used by these apps are often considered advanced NoSQL features!  However, many of them have workarounds that make migration worthwhile. When you consider the cost difference, some work arounds are worth it!

6. Read-Write Patterns: RDBMS generally assumes you write the data once and read it many-may times.  Assuming 80% SELECT and 20% INSERT/UPDATE/DELETE the typical data pattern.  Hence, a slower WRITE throughput on the system isn't penal. There are data warehousing situations where you're loading large amounts of data.  And there are many tools and techniques to do this.  Special loaders which load without the overhead of transactional log is one of them.  Other is to create a separate table, prepare the data in a separate table and then simply ATTACH that table into a single RDBMS table online.

NoSQL databases are designed to handle continuous stream of data at a very high speed.  And since you're advised to create minimal additional indices, the udpates will be eve faster.   The WRITE patterns in NoSQL suitable applications is typically much higher than 20%. 


Going from a SQL to NoSQL system is a lengthy process. The pros-cons should be considered carefully. NoSQL systems offer exciting new technologies to solve many business problems now and will eventually mature to support complex operations and applications. 

A Deep Dive Into Couchbase N1QL Query Optimization

[Reposting of the article published with Sitaram Vemulapalli on DZone.