On the Futility of Custom Consistency Models

hyperdex nosql consistency November 28, 2014 at 09:57 PM Emin Gün Sirer

In his post-Thanksgiving piece on Thanksgiving meals, Krugman writes that we, as a society, tend to overstate individual differences:

First, on the no best diet point: We tend, as a culture, to overstate individual differences. Turn on CNBC and you’ll see lots of ads for accounts that let you invest to meet your needs, or something; um, the vast majority of people should NOT be making investment choices, they should just park their money in an index fund. The same for insurance policies, whatever — and even on consumption, how many people really, really gain a lot from being able to, say, customize their fast food?

Custom heart attack.

The same is true for data consistency models. There is a cottage industry of academics who are churning out paper after paper, coming up with consistency models with ever so slightly different semantics than existing ones. Others are trying to build systems where the user gets to specify exactly which consistency model they want for each data item. Actually, there are far fewer of the latter kind, because actually building things that work is hard, so we get "simulated systems," data stores that don't handle failure cases, "Big Data" papers where objects are 1-byte big and there are only a million of them, and so forth. The bottom line is that there is some noise about weak consistency models and systems that support multiple weak models.

In reality, few people would actually benefit from being able to specify that, say, they would like strong consistency for their business data, but eventual consistency for the database that holds the boss's inventory of Magic the Gathering cards. Is there a case to be made that these two types of items are different and updates to them could be propagated differently? Yes, absolutely. Is there a case that says that taking advantage of such differences yields a tangible benefit? The answer so far is a loud and resounding no. Being able to pick between multiple weak models is just much ado about nothing as long as none of them can outdo the best of breed system that gives you the strongest consistency model.

The temptation is to think that distributed systems work like Lent: by giving something up that we'd ideally like to have, like consistency, we can get in return something we badly want right now, like performance. First-gen NoSQL did exactly this. Mongo, for instance, preemptively jettisoned consistency to get availability and performance. Yet it loses data, and its performance is nowhere near a modern data store like HyperDex. There is no "fork of the CAP theorem" where the data store is allowed to just lose the data. When HyperDex offers the strongest consistency guarantee (serializable ACID transactions that span multiple keys) at 3x the speed of Mongo, that "tradeoff" is no tradeoff at all, just a strict loss. Someone made a Faustian bargain, and Beelzebub didn't live up to his side of the contract. Who ever could have guessed?

To be fair, there are some reasons why one might choose Mongo over HyperDex. Ease of finding someone who is familiar with the Mongo API is probably the main one. Using HyperDex requires learning a slightly different, and more powerful, API, and that takes effort.

Systems that offer a strong, well-understood model, and do it well, are going to be faster than spaghetti-code systems that try to support multiple models. Now, I am not claiming that weakly consistent systems will never make sense. But they do not in the current era, at least not based on any evidence I can see. Anyone who is advocating one needs to compare to the baseline best-of-breed system, like HyperDex or RAMCloud. If a data store cannot outperform the system with the strongest consistency model, then these multiple levels of consistency systems are just a breding ground for bugs and heartache, glorified bedtime stories for grownups. "And then the Little Red Riding Hood stored all her data at different levels of consistency. Isn't that nice? Now go to bed."

This is why you don't let a teenager design his own burger.

Since we're discussing what principles ought to drive system design, let's not forget that we are not living in the disco 70's, tube socks are nowhere to be found, my long curly hair only appears in a few faded family photos now kept under lock, and system designers no longer have to worry about how their code compiles down to microcode. Code maintainability is paramount for most projects, big and small. A system that is correct by the skin of its teeth today will develop subtle flaws as its requirements silently evolve out of the design envelope of the underlying weak consistency model. You might think that the boss's Magic the Gathering database would never evolve to hold half a billion worth of digital assets, but we know how that turned out.

Back on Krugman's point: if I could have a full balanced meal every day, prepared faster than fastfood, it'd be weird to eat burger after burger. Even if I was somehow convinced, in some mon(g)omaniacal quest, that eating burgers every day was a perfect match for my current lifestyle, doing so would be ridiculous.