Introducing HyperDex Warp: ACID Transactions for NoSQL

The News

My research group had a big announcement today, a new technology we've dubbed HyperDex Warp. Warp is the long sought after unicorn in the NoSQL world: a light-weight, distributed, sharded data store that supports transactions over multiple keys. It combines the speed and scale advantages of NoSQL systems with the ACID guarantees that you are familiar with from traditional RDBMSs. HyperDex transactions are fully-decentralized and operate on multiple objects.

HyperDex Transaction Semantics

We support traditional ACID transactions with one-copy serializability. There is no catch. The system is distributed, sharded and concurrent, and yet it ensures that the transactions appear to have been applied to the data one after another, atomically, consistently, with isolation, and with a fault-tolerance guarantee.

The following code shows how the "hello world" example from transaction processing 101 is implemented on the HyperDex API:

t = hyperdex.begin_transaction()
bal1 = t.get("accounts", "tim")["balance"]
bal2 = t.get("accounts", "joe")["balance"]
bal1 -= 100
bal2 += 100
t.put("accounts", "tim", {"balance": bal1})
t.put("accounts", "joe", {"balance": bal2})
t.commit()

This example illustrates how a single client can modify two accounts at the same time. The updates to tim and joe's accounts will either take place together, atomically, or not take place at all.

There is no "Catch" or Fine Print

There is a lot of noise around NoSQL systems, and some systems like to call atomic operations on a single objects "transactions." Such systems do not support transactions in the traditional sense. HyperDex also supports atomic operations on single objects, in addition to transactions, and has done so for a while now.

But a sequence of atomic operations is not the same thing as a proper transaction. Let me illustrate with a simple example. The following (legal, supported) HyperDex code will not have the same effect as the transaction example above:

hyperdex.atomic_sub("accounts", "tim", {"balance": 100})
hyperdex.atomic_add("accounts", "joe", {"balance": 100})

These two operations, while they are individually atomic, will not execute together as a single, indivisable, atomic unit. There may be a time window where the net sum of money in the system is incorrect. And other operations may be interleaved in between such individually atomic, but non-transactional, sequences of operations. If you take a backup during this window, for instance, your bank may end up with unreconcilable books.

What Can Warp Transactions Do?

HyperDex Warp provides a true transactional interface. A Warp client has access to all of the datatypes and key-based operations that HyperDex offers. Clients can start any number of transactions concurrently, and issue, in any intermixed order, operations in each of those transactions. A client can access any number of objects within a transaction, and can modify any number of them.

Warp transactions provide isolation. The side-effects of transactions are not visible until they successfully commit. While within the transaction, the process will be isolated from changes concurrently being applied by other transactions.

Transactions employ optimistic concurrency control, wherein conflicting transactions may abort, in which case they have no effect on the database. Applications whose commit did not succeed will typically catch an exception and reapply their changes.

A successful commit takes place atomically, as if the set of modifications were applied on all objects involved in the transaction at the same time.

So, there really are no caveats, exceptional cases, or strange mechanisms and settings you have to master to understand how Warp transactions work. These are your mother's ACID transactions.

What Can Warp Transactions Not Do?

HyperDex Warp does not support nested transactions at the moment, and the usual message size and memory limits apply.

What's the Breakthrough

HyperDex Warp embodies a novel new optimistic concurrency protocol called linear transactions that enables the system to detect and avoid situations where ACID properties would be violated.

Comparison to other NoSQL Systems

I have been deluged with requests to compare HyperDex pair-wise with every existing NoSQL database, so let me quickly compare Warp to the other players in the field.

MongoDB, Cassandra, and Riak do not provide transactions over multiple objects.

The nascent players RethinkDB and Couchbase do not provide multi-key transaction support, either. The references in Couchbase's documentation to ACID are referring to their guarantees for single object operations, i.e. atomic operations. FoundationDB supports ACID transactions, but their architecture seems to be centered around dedicated transaction managers, which pose a scalability bottleneck and violate the NoSQL architecture based around distribution and sharding.

Google's Spanner system provides a transaction interface. Spanner relies on lock-based concurrency implemented using the heavy-weight Paxos protocol for coordination. HyperDex's transactions use optimistic concurrency control and rely on a lighter weight protocol.

Comparison to RDBMSs

HyperDex is a modern NoSQL store. It keeps all data sharded across a collection of machines, and uses novel techniques to coordinate the data on this cluster to provide its features. It differs from RDBMSs in its interface and architecture.

Interface

RDBMSs traditionally provide a declarative interface -- their clients express their queries in SQL, and the database is tasked with coming up with the optimal query plan for executing each query. To compute such plans effectively, RDBMS need to maintain many additional datastructures and entail great complexity, with mixed results and limited performance.

HyperDex provides an imperative API -- HyperDex clients directly store and retrieve objects through direct calls whose costs are predictable and well-characterized.

Architecture

Historically, RDBMSs imposed a centralized organization on data, where all tables were located on a single node. While recent databases have tried to diverge from this view, the nature of their SQL interface makes it difficult for RDBMS to scale. Further, RDBMSs employ heavy-weight mechanisms for locking and for distributed transaction management that can limit performance.

HyperDex relies on a distributed, sharded architecture. Since data resides on a collection servers, independent operations can execute in parallel, aiding scalability. Novel algorithms for placing data in a cluster while retaining its locality (hyperspace hashing), for replicating it in a consistent manner (value-dependent chaining), and for managing operations that span multiple keys (linear transactions) enable it to provide the same ACID properties as RDBMSs.

How to Check It Out

We have made an evaluation version of Warp freely available. Follow the download instructions to install Warp. You will need a 64 bit Ubuntu, Debian or Fedora instance.

The evaluation copy is purposefully missing all of the fault-tolerance code. It is not intended to be used in production! If you would like to acquire a full version of Warp with full fault tolerance, please get in touch with the HyperDex Team.


comments powered by Disqus