Friday, December 20, 2013

Informix NoSQL: hybrid storage and access details.


On December 17th, John Miller talked about Informix NoSQL hybrid storage.

The presentation and audio will soon be posted at: http://ibm.co/JFkr1B

In Q&A, there was a question and comment by Lyn Robison from Gartner on advising customers to use the right data model for the right problem/application. We completely agree. We'd like customers to use the right model for the right problem. With informix not only you can use either standard relational model or flexible schema with JSON for your appdev. Obviously, this can create data silos if your API is tied to specific data model. We believe data model should not restrict data access. Hence the hybrid access.






So, how does this hybrid access work in Informix NoSQL?
Informix provides both SQL and NoSQL semantics. You can define regular relational schema or simply use MongoAPI and Informix creates the database, collections and documents just like MongoDB does. Let’s look at the implementation details. Some of these are already in our detailed deepdive at:slidesha.re/1gEGXW6

Data representation:

SQL:  SQL (relational) data is stored in regular tables with rows and columns.  Logical schema is distinct from the physical schema.

NoSQL: Flexible schema for the data means, no upfront definition of tables or rows or columns or their types. Each row could have random values.  To achieve this, NoSQL databases like MongoDB store data in JSON (actually in its binary form called BSON).  JSON is a series of key-value pair.  You can nest key-value pars within other key value pairs to form hierarchical structures or arrays.  This is generally referred to as document-structure.

For each NoSQL collection, we create a table with a BSON data type. BSON is Binary JSON (http://bsonspec.org/). JSON is Java Script Object Notation (http://www.json.org/).  All the MongoDB APIs exchange information with the server using BSON.   When the client sends data as BSON, it’s stored AS IS in BSON.

NoSQL pays more attention to application flexibility and agile appdev instead of storage efficiency.  So, for now, additional space for key in key-value pair is fine.  Eventually, all the databases will be looking at making JSON/BSON storage efficient. Within Informix, you can use compression to get space savings. 

Query Processing:
NoSQL API on SQL data:

Relational data (relations or resultset) can treated as “regular” JSON documents; columnname-value becomes key-value pair.  So, translating between relational data to JSON and vice-versa becomes easy.  

This helps enterprises with large dataset in relational tables to use it in the web appdev stack like MEAN.  

SELECT partner, pnum, country from partners;

 partner                 pnum            Country
 Pronto                   1748                    Australia
 Kazer                    1746                    USA           
 Diester                  1472                    Spain
 Consultix                1742                    France

{parnter: “Pronto”, pnum:”1748”, Country: “Australia”}
{parnter: “Kazer”, pnum:”1746”, Country: “USA”}
{parnter: “Diester”, pnum:”1472”, Country: “Spain”}
{parnter: “Consultix”, pnum:”1742”, Country: “France”}

Listner translates the query and the data object between relational and JSON/BSON form.  

db.partners.find({name:”Pronto”}, {pnum:1, country:1}).sort({b:-1})

SELECT a, b FROM t WHERE a = 2.0 ORDER BY b DESC;

db.partners.save({pnum:1632, name:”EuroTop”, Country: “Belgium”});
INSERT into partners(pnum, name, country values(1632, ”EuroTop”, “Belgium”);
db.partners.delete({name:”Artics”});
DELETE FROM PARTNERS WHERE name = “Artics”;
Db.partners.update({country:”Holland”},{$set:{country:”Netherland”}}, {multi: true});
UPDATE partners SET country = “Netherland” WHERE country = “Holland”;

The moment you have this kind of access, from MongoDB API, you can exploit the relational database features like transactions, views, joins, grouping, OLAP window functions, stored procedures, etc.

In this case, if a JSON query references a non-existent column, they’ll get the error.  The intent is not to simply extend existing relational schema, but to make the existing enterprise data available to new APIs seamlessly.

SQL on NoSQL data:

If you have the following JSON collections:

JSON Collection V:
{ "_id" : ObjectId("526a1bb1e1819d4d98a5fa4b"), "c1" : 1, "c2" : 2 }

JSON Collection w:
{ "_id" : ObjectId("526b005cfb9d36501fe31605"), "x" : 255, "y" : 265, "z" : 395}
{ "_id" : ObjectId("52716cdcf278706cb7bf9955"), "x" : 255, "y" : 265, "z" : 395}
{ "_id" : ObjectId("5271c749fa29acaeba572302"), "x" : 1, "y" : 2, "z" : 3 }

Ideally, you would want to use SQL like this:

SELECT V.c1, V.c2, W.x,W.y,W.z
FROM V, W
WHERE V.c1 = W.x;


Currently, you’ll have to use expressions and dotted notations to extract the specific key-value pairs.

SELECT bson_value_int(jc1.data, 'x'),
       bson_value_lvarchar(jc1.data, 'y'), 
        bson_value_int(jc1.data, 'z') , 
        bson_value_int(jc2.data, 'c1'), 
        bson_value_lvarchar(jc2.data, 'c2') 
FROM w jc1, v jc2
WHERE  bson_value_int(jc1.data, 'x') =   bson_value_int(jc2.data, "c1");

You can also create views on top of these and make the access much simpler for application developers.

create view vwjc(jc1x, jc1y, jc1z, jc2c1, jc2c2) as
SELECT bson_value_int(jc1.data, 'x'),
       bson_value_lvarchar(jc1.data, 'y'), 
        bson_value_int(jc1.data, 'z') , 
        bson_value_int(jc2.data, 'c1'), 
        bson_value_lvarchar(jc2.data, 'c2') 
FROM w jc1, v jc2
WHERE bson_value_int(jc1.data, 'x') = bson_value_int(jc2.data, 'c1');

Summary:
You can model using relational or NoSQL concepts within the same database and access data from  either SQL and MongoDB API without replicating the data or having ETL.  Since you just have one copy of the data, you'll be accessing the consistent copy of the data.












A Deep Dive Into Couchbase N1QL Query Optimization

[Reposting of the article published with Sitaram Vemulapalli on DZone.  https://dzone.com/articles/a-deep-dive-into-couchbase-n1ql-query-op...