Tuesday, November 18, 2014

Notes on Real Time Data Processing

Real time is real hot right now. Days became hours, hours became minutes, minutes became seconds, seconds became nano seconds. Time is money. Go ahead. Show me the money. This is the mantra from all of hardware and software vendors. Do the analytics in real time. Your decisions, results and revenues will automatically jump up. When you perform a database transaction, events are triggered by exchange of messages using some protocol. The systems are recording changes as the event are unfolding. Analysis of this data should happen immediately after the event.  So, colloquially, real time processing is analysis of data as the event is unfolding

But, What is real time?

In all of cosmos, there is one, just one phenomenon we know happening in real time.  EPR thought experiment was invented by Einstein, Podolsky, Rosen to show quantum theory was an incomplete explanation of reality.  Simply put, it shows the information can travel the length of the universe — instantly. This violates Einstein's special theory of relativity which states that nothing can travel faster than light. Experiments have shown EPR's REAL TIME effect, seemingly faster than light, does happen. Such is the strange quantum world. See http://en.wikipedia.org/wiki/EPR_paradox, if you’re curious about more information. 

So, in physics, you  hardly hear about real time travel or information processing.  Speed of information travel, except in quantum world, is restricted by speed of light.


Real Time in Software. 

Computer science and software, of course, won’t be bothered with such constraints.  Everything from streaming, fast data ingestion, messaging, business intelligence to queries on the stream claim real time computing.  There is, however,a restrictive and reasonable definition of real-time computing in Wikepedia hardware and software systems that are subject to a "real-time constraint", for example operational deadlines from event to system response. Real-time programs must guarantee response within strict time constraints, often referred to as "deadlines". The “real-time constraint” can be defined exactly in cases where violation has immediate effect, like fire extinguisher, furnace temperature monitor, etc.

In other cases, e.g. stock trade or option trading, real time means doing it faster than competition to maximize revenue & profit.  The faster you do it, the higher is the revenue/profit possibility. So, for that, what’s the limit? Nano second? Planck second? Here, every nano second counts. Every type of bandwidth counts. Every optimization counts.  All to save precious time.

So, it's important to realize which use case your real time system is handling -- whether you're time-bound to ensure you avoid catastrophe or you're chasing time's tail to maximize profit. Once you determine your constraint, it's easier to design systems, select the right architecture & tools to achieve that.

For interesting evolution of the term real time, see Phillip Laplante's article on It Isn’t Your Father’s Realtime Anymore at: http://queue.acm.org/detail.cfm?id=1117409

A Deep Dive Into Couchbase N1QL Query Optimization

[Reposting of the article published with Sitaram Vemulapalli on DZone.  https://dzone.com/articles/a-deep-dive-into-couchbase-n1ql-query-op...