What is NoSQL and is it pornography?
What makes something a NoSQL data store? What are the quintessential elements that define the NoSQL movement? Will NoSQL forever remain a nebuluous concept, in the same "I know it when I see it" category as pornography? 
These were the thoughts on my mind yesterday as MySQL 5.6 came out with a press release touting "NoSQL features." And what exactly were these much touted features? A memcached interface. For those of you who are not familiar with the memcached interface, here it is in all its 13-call glory. I can attach that to Excel, yet I will not end up with NoSQL.
This whole marketing overreach is like boasting that your neon yellow Miata S2000, with the enormous after-market wing that you added with your buddies' help, has "Bugatti features." You just might feel a little rattle when you speed past 35. Funny how the real Bugatti owners never notice that at 150.
Clarity through the marketing fog
But we can't blame MySQL for trying to buy into the NoSQL hype by coopting a poorly defined term. If there is an overhyped area right now, it's clearly Big Data in general, and under that heading, NoSQL is the worst offender (leading by a nose over Hadoop).
Now, there are lots of legitimate reasons for the excitement around this area. In fact, I would argue that the Big Data area is much more promising than what even the loudest voices in the press are proclaiming and forecasting. We really have lots of data, and we have an enormous shortage of tools that can keep up with the demand for functionality, scale and speed. It's an exciting space.
But the excitement brings out the marketing engines, and they make so much noise that it gets difficult to tell apart traditional RDBMSs (think Oracle or IBM's System R from That 70's Show, the other one) from revolutionary NoSQL data stores.
And we, the NoSQL folks ourselves, have been absolutely horrible at characterizing the area. If you look around with a critical eye, you hardly ever see anyone characterize NoSQL in positive, constructivist terms. The negativity is embedded in the name: No SQL. Well, my toaster has no SQL, so is it NoSQL? How can I tell? Not with that definition, I can't.
It's not about BASE vs. ACID
Some people have come up with the acronym BASE to characterize the NoSQL movement, which contrasts nicely with the ACID guarantees of traditional RDBMSs. Catchy it may be, but it misses the mark by a mile:
Basically Available: This term has no technical definition, but I think I know what it means. It's exactly like when I'm late to a meeting, and not just a little late, but late as in I-was-stepping-into-the-shower-around-the-time-when-you-guys-were-starting kind of late, and I call later from the car and say "On my way, I'm Basically There." Basically something is what we say when it's very much not that thing at all.
Soft-state: Nice, a technical term whose definition we can all agree on. Let's just phrase it in layman's terms for Devops Borat: the data, it ain't safe in there. Better store it in something else if you want to see it again.
Eventual Consistency: Ah, lest we forget, way to roll a specific and incredibly mediocre consistency model into the definition, because whoever came up with the acronym just cannot fathom a world where NoSQL stores actually provide strong guarantees.
Face it: eventual consistency is no consistency at all. Data stores which never write to storage (e.g. by considering a write done when it's in the client's send buffer) are eventually consistent. Data stores that always read 42 are eventually consistent. And I do not mean either of these facetiously -- look up the technical definition if you don't believe me. When will you see your data again? Eeee-ven-tual-ly. I learn it from a book. A marketing book on BASE and NoSQL. Also, wikipedia.
Wiki gets it wrong
The definition of NoSQL in Wikipedia is pretty confused as well. Take a look at the first sentence:
In computing, NoSQL (commonly interpreted as "not only SQL") is a broad class of database management systems identified by non-adherence to the widely used relational database management system model.
For one, it's blatantly revisionist: NoSQL is not commonly interpreted as "not only SQL", not now and certainly at no time in the past. For another, it's a totally useless definition. A text file I open up in notepad also does not adhere to the RDBMS model, but it's not NoSQL.
In Wiki's defense, someone has tagged the article for being out of compliance with Wikipedia's quality standards. Clearly, our community is way behind the curve -- even the hip-hop and punk rock pages have a better characterization of what their movement is all about.
Towards a Proper Definition
Let me, in a spirit of positivity on Valentine's Day, offer three useful observations that might help capture the NoSQLness of a piece of software:
It's not the API: What sets aside NoSQL systems from RDBMSs is definitely not their interface. We're beginning to see NoSQL systems begin to offer SQL bridges, and we're beginnning to see RDBMSs slap on NoSQL makeup on their APIs. So, whether a data store does SQL or not is just not the differentiating factor.
It's not the features: What sets aside NoSQL systems from RDBMSs is definitely not the features or properties of the systems. Most are eventually consistent, but some second-generation NoSQL systems, like HyperDex, provide strong consistency guarantees. Some can lose data to failures, whereas others treat data gingerly even through disasters. Some are fast, and some have speedy-looking Cadillac wings but top out at 45 on the highway. So any attempt at finding a common set of features or properties, like BASE, is destined to fall short.
It's the architecture: What sets aside every single NoSQL system I know from RDBMSs is their internal architecture. From 1970's until about 2000 or so, every database looked identical. It looked like a klunky centralized server, designed in a way where the server takes your data, forces you to declaratively specify what it is that you want, and it takes a "father knows best" attitude at giving it back to you. It does all kinds of things like computing "optimal" query plans to suck your data back off of the disks through the hose known as the disk head. NoSQL looks nothing like this. And that's why it's fast, that's why it scales, and that's why things that used to be easy for RDBMSs, like ACID transactions, are tricky on NoSQL.
A Positive Definition
My definition of NoSQL, which I tried to make as precise yet as inclusive as possible, is as follows: "NoSQL: a broad class of data management systems where the data is partitioned across a set of servers, where no server plays a privileged role."
I think this definition accommodates every system I would characterize as NoSQL (e.g. HyperDex, Riak, Cassandra, MongoDB, etc), and it rules out foggy marketing speak. Feel free to pick it apart and to refine it, so that it may, one day, be better. Eventually. Or, ideally, within a time bound.
comments powered by Disqus