Time for Better Security for NoSQL

The state of NoSQL security is about as abysmal as the state of security in RDBMS systems. Namely, wholesale databases are vulnerable to the compromise of a single client. By breaking into a single Internet-facing client or by impersonating a client without breaking into one, an attacker can steal all data in the back-end database, turning your company into Target, Jr. Where security measures are used, they are clunky, coarse-grained, and overall entail more of a hassle than the peace-of-mind they actually bring. Anyone who ever had to set up Oracle user credentials has seen the dismal state of the art in database security, and NoSQL is no better.

Someone leaked macaroons into our code

Macaroons: so tasty, so secure.

We are hardly the first people to note this industry trend. In fact, conscientious companies are well-aware of these problems, and have come up with solutions. One of these solutions, a recent proposal from Google called Macaroons, offers some amazing functionality.

Macaroons were introduced to ensure that a security breach on a single GMail front-end server would not allow the attacker to read the whole world's emails. Given that Google is constantly under attack by state-level actors, and given that email compromises can lead to life-or-death situations for dissidents, it is absolutely crucial to contain what data any given client can access. This functionality is also critical in regulated settings that require stringent firewalls and control over users' actions, such as finance, and in other settings where the flow of data needs to be tightly controlled, such as healthcare and intelligence.

A second problem with database security is that of expressing interesting policies. There is a fundamental problem with all security enforcement (aka "authorization", typically built in conjunction with a user identification system, aka "authentication") systems: once you bake the security enforcement into your code, your system has forever bought into a particular security model. Sure, you could spend lots and lots of effort on this component, and try to add mechanisms for every conceivable kind of policy one might want. History tells us that this is a losing battle, as security policies evolve constantly, and every bit of functionality is both a maintenance headache and a potential vulnerability. An alternative is to just build the MVP, probably involving some primitive concept of users and access control lists (hello "CREATE USER"!), and we'll consider ourselves lucky if the authentication module doesn't store the passwords in plaintext. Typically, the auth module, the unwanted child of every database system, is dated within 6 months.

Yet it would be incredibly cool if you could express rich policies that rely on data distributed around a network. For example, there may be data that is only accessible to those people who are current employees in the company LDAP database, and marked as part of, say, the regulatory-compliance team, but only on certain days of the month, and only for certain times during the day. Calendar data should be readable and writable by a particular user, but when GMail wants to render a calendar widget on your GMail page, the server should have only read-only access. Some documents, perhaps containing license-restricted data, may be limited in the number of times they can get accessed. All of these examples involve rich security policies that bring together factoids from several different sources. Consequently, they are very difficult to anticipate in advance. For sure, we know of no system that can express them.

As a result, our colleagues at Google looked into mechanisms for enabling such rich policies, enforced at fine-grain. Their first cut at this was a system known as "Thin Mints", a variant of cookies that relies on public-key cryptography. More recently, they came up with a newer, cooler, lighter-weight approach known as "Macaroons", a new decentralized authorization framework for use in distributed systems.

We recently built the first open-source implementation of Macaroons, which now has become the standard for various different implementations in Java, Haskell, Ocaml, and Go. With the latest release of HyperDex, HyperDex now supports Macaroons as first-class objects.

Macaroons and distributed systems go well together.

Macaroons and distributed systems go well together.

Macaroons are an excellent fit for NoSQL data storage for several reasons. First, they enable an application developer to enforce security policies at very fine granularity, per object. Gone are the clunky security policies based on the IP address of the client, or the per-table access controls of RDBMSs that force you to split up your data across many tables. Second, macaroons ensure that a client compromise does not lead to loss of the entire database. Third, macaroons are very flexible and expressive, able to incorporate information from external systems and third-party databases into authorization decisions. Finally, macaroons scale well and are incredibly efficient, because they avoid public-key cryptography and instead rely solely on fast hash functions.

Let's go through a quick tutorial that demonstrates the use and power of macaroons, using HyperDex, our state-of-the-art NoSQL database that recently added support for macaroons. We'll start slow, show you some familiar operations, and build up to an example towards the end where we implement a rich security policy that can only be expressed using Macaroons.


As in the previous chapters, the first step is to deploy the cluster and connect a client. The cluster setup below is similar to the previous chapters, so if you have a running cluster, you can skip to the space creation step.

First, we launch and initialize the coordinator:

$ hyperdex coordinator -f -l -p 1982

Next, let's launch a daemon process to store data:

$ hyperdex daemon -f --listen= --listen-port=2012 \
                     --coordinator= --coordinator-port=1982 --data=/path/to/data

We now have a HyperDex cluster ready to serve our data. Now, we create a space and declare that we will be using macaroons.

$ hyperdex add-space << EOF
space accounts
key account
   string name,
   int balance
with authorization

The added statement, "with authorization," indicates to HyperDex that we wish to have Macaroons enabled for this space.

Now that the space is ready, let's create some objects.

Using Macaroons

The core idea behind using Macaroons is that each macaroon is minted from a unique secret key. Think of this key as a shibboleth of sorts -- anyone who can utter it is a member of the secret society that can gain access. Any principal in possession of the secret is able to create a macaroon to access the object, as if they created it. In essence, it acts as a master secret, a capability, that grants total access to the object.

>>> import hyperdex.client
>>> c = hyperdex.client.Client('', 1982)
>>> SECRET = 'super secret password'
>>> account = 'account number of john smith'
>>> c.put('accounts', account, {'name': 'John Smith', 'balance': 10}, secret=SECRET)

Once an object has an associated secret, attempts to retrieve that object will fail unless accompanied by this secret:

>>> c.get('accounts', account)
Traceback (most recent call last):
HyperDexClientException: ... it is unauthorized [HYPERDEX_CLIENT_UNAUTHORIZED]

HyperDex will deny the application access to the object unless the client presents a macaroon that proves the request is authorized. Such a macaroon is called a root macaroon. The root macaroon demonstrates knowledge of the master secret that protects the object.

Root macaroons are created by specifying the secret and converting them into portable tokens. Under the covers, these tokens do not actually carry the secret (for if they did, someone could reverse-engineer a macaroon and obtain unfettered access to the object), but instead carry an irreversible hash of the secret.

Let's create a root macaroon from scratch:

>>> import macaroons
>>> M = macaroons.create('account number', SECRET, '')
>>> token = M.serialize()

In this case, M is the root macaroon, and token is the serialized version of that macaroon that can be passed around easily. This macaroon provides full access to John Smith's account, and may be used to read the account information or update the account balance.

Image credit Kaythryn Wright on flickr CC-ND

Macaroons are easy to stack!

The really cool thing about macaroons is that any code can produce this token. In the best case, John Smith's browser can prompt John Smith for his bank account password, run it through a KDF to obtain a secret, and generate the macaroon from the secret. In effect, John Smith can prove he is authorized to access the account without ever having to pass the password over the network. Someone who breaks into a front-end web server would not get access to the whole database; they would at best gain access to the set of users who used the service during the compromise period, and no more!

Once John Smith's browser passes the token to the bank account web server, the server can use said token gain access to John's account object:

>>> c.get('accounts', account, auth=[token])
{'name': 'John Smith', 'balance': 10}
>>> c.atomic_add('accounts', account, {'balance': 5}, auth=[token])
>>> c.get('accounts', account, auth=[token])
{'name': 'John Smith', 'balance': 15}

While this basic example shows how to use macaroons, it doesn't fully exploit their power. The true power of macaroons stems from the ability to embed caveats into macaroons. A caveat is essentially a restriction on what the macaroon authorizes; it turns a full object capability into a restricted capability.

For instance, in our running example, John Smith may be simply reading his bank account balance from his smart phone. The app on the phone knows the request is read-only, so it may embed a caveat into the macaroon that says the macaroon is only authorized for read requests. We can easily create a read-only macaroon to accomplish this:

>>> M = macaroons.create('account number', SECRET, '')
>>> M = M.add_first_party_caveat('op = read')
>>> token = M.serialize()

This new macaroon has the caveat that it is useful solely for read operations. More importantly, this same step can be done entirely within John's end host, so his password never leaves the machine he is using. Should an attacker gain hold of the token, the most that they can do is read John's account balance; attempts to write with the macaroon will fail at the backend data store, as desired:

>>> c.get('accounts', account, auth=[token])
{'name': 'John Smith', 'balance': 15}
>>> c.atomic_add('accounts', account, {'balance': 5}, auth=[token])
Traceback (most recent call last):
HyperDexClientException: ... it is unauthorized [HYPERDEX_CLIENT_UNAUTHORIZED]

Macaroon caveats can be stacked or chained on top of each other, to create arbitrarily restricted capabilities. For instance, we can enhance the security of John's request to his bank by adding an expiry date to our read-only macaroon such that it is only valid for thirty seconds. With this caveat, the token becomes completely useless to an adversary thirty seconds after John's request.

We can accomplish this with the following code:

>>> M = macaroons.create('account number', SECRET, '')
>>> M = M.add_first_party_caveat('op = read')
>>> import time
>>> expiration = int(time.time()) + 30
>>> M = M.add_first_party_caveat('time < %d' % expiration)
>>> token = M.serialize()
>>> c.get('accounts', account, auth=[token])
{'name': 'John Smith', 'balance': 15}
>>> time.sleep(31)
>>> c.get('accounts', account, auth=[token])
Traceback (most recent call last):
HyperDexClientException: ... it is unauthorized [HYPERDEX_CLIENT_UNAUTHORIZED]

Macaroons are extremely efficient to construct and verify, as they rely solely on efficient hash functions and avoid public key cryptography. This means that clients may generate a new macaroon-based token for each request. Each of these tokens may have a unique expiration time very near in the future. Even if the token makes its way into the hands of a malicious user, the token can only be used for a short period of time, and subject to other caveats attached to the macaroon.

Advanced Caveats

Macaroons enable rich security policies to be enforced. Suppose, for example, that you have a really complicated security policy that relies, say, on the phase of the moon. The traditional way to handle these policies is to embed a phase-of-the-moon-calculator into your servers. So from then on, all of your database servers would actually have code in them to calculate the current phase of the moon. Clearly, embedding such code into your servers is a terrible idea -- as security policies grow in complexity, so, too, must your backend code, introducing instability and requiring unnecessary upgrades. (One may think that the phase of the moon does not require updates, but even with this laughable example, one would be wrong: the official phase of the moon in some countries is determined by a moon sighting by the naked eye -- changes in weather and eyesight might demand a software upgrade!). It would be ideal if our security policies could incorporate arbitrarily complex facts, such as the phase of the moon, yet embody no complexity.

The way to achieve this flexibility is to incorporate a universal mechanism by which the servers can delegate the discovery of facts to third parties. Macaroons accomplish this by a lightweight mechanism called third-party caveats, which enable security policies to efficiently consult third-parties for their approval.

Such third-parties may verify any property of an application during their access control decisions. For instance:

  • User authentication: The third party service can authenticate the user against existing user databases (e.g., LDAP, OpenAuth, Facebook, Twitter and the like), and provide a proof that the user is the same user identified in the third party caveat. The database service needs to know absolutely nothing about these authentication measures. One can build arbitrarily complex groups or role-based access control on top of this infrastructure.
  • Auditing and logging: The third party service can log the interaction, and issue a proof that the request was logged securely in a centralized logging location. In essence, the rule for accessing an object can be "this access must be logged," and only those clients that furnish proof of having been appropriately logged can gain access to that data. This is a good fit for any context that requires regulatory compliance, such as HIPAA, SEC or Sarbanes-Oxley regulations.
  • Usage limits: The third party can check to ensure that the user does not perform a given operation more than a desired number of times. Critical metadata, such as decryption keys, may entail strict access controls where they are accessible to a certain class of users only a certain number of times. Macaroons enable such policies to be enforced.

The naive way to implement third-party caveats would be to have the database server perform an RPC out to a third-party server on every access. But this would slow down every access, and HyperDex's lightning speed would be obscured by the network latencies required for the server to consult a third host and check if the client should proceed.

Image credit IFC LCB Macaron on Flickr CC-BY-SA

Let a third party do the heavy lifting, so you can focus on what counts: consuming macaroons.

Macaroons make this process lightweight and efficient by turning the tables around. In essence, instead of placing the onus of checking security criteria on the server, they ask each client to present the reasons why the server should grant access. The academic name for this technique is credentials-based authorization, where the client presents its credentials for access. These credentials are presented in what is known as a discharge macaroon. The client says something like "I am authorized to access the company database because the moon is in the right phase, and here, I acquired this authentic, unexpired statement from the phase-of-the-moon checker that I am telling you the truth."

Behind the scenes, the server needs to simply check the authenticity of the statements against the security policy. Once again, a naive implementation would use public-key cryptography for this (and Google's initial security subsystem, known as Thin Mints, did just that until Macaroons came along), but macaroons achieve the same level of security using only efficient one-way hash functions.

Let's see how this works by implementing a user authentication service for macaroons. This service provides a means of generating third party caveats, and a method for clients to authenticate themselves with macaroons. The service exposes a call to generate caveats, whose implementation looks like this:

>>> keys = {}
>>> def add_caveat_rpc(key, user, password):
...     r = 'a random string' # your implementation should gen a rand string
...     keys[r] = (key, user, password)
...     return r

The client can then call this method (over HTTP or some other service-like interface), and retrieve an identifier for the third-party caveat.

>>> key = 'a unique key for this caveat; should be random in the crypto sense'
>>> ident = add_caveat_rpc(key, 'jane.doe@example.org', "jane's password")

The identifier returned from the add_caveat_rpc call can be embedded in a macaroon as a third party caveat:

>>> M = macaroons.create('account number', SECRET, '')
>>> M = M.add_first_party_caveat('op = read')
>>> M = M.add_third_party_caveat('http://auth.service/', key, ident)
>>> token = M.serialize()

Notice that the client constructs the third-party caveat using the key it provided to the third-party, and the identifier returned from the third party. The URL http://auth.service/ is a location-hint as to where the service for the third-party caveat resides.

When the client tries to use our new token, the request will be denied because the macaroon does not carry a full proof authorizing access to the object.

>>> c.get('accounts', account, auth=[token])
Traceback (most recent call last):
HyperDexClientException: ... it is unauthorized [HYPERDEX_CLIENT_UNAUTHORIZED]

To obtain this access, the client must go back to the third party and request the discharge macaroon that proves that the user can authenticate using Jane's email and password. The implementation within the third-party recalls the key, checks the username and password, and returns a discharge macaroon when the user authenticates successfully.

>>> def generate_discharge_rpc(ident, user, password):
...     if ident not in keys:
...         # unknown caveat
...         return None
...     key, exp_user, exp_password = keys[ident]
...     if exp_user != user or exp_password != password:
...         # invalid user/password pair
...         return None
...     D = macaroons.create('', key, ident)
...     expiration = int(time.time()) + 30
...     D = D.add_first_party_caveat('time < %d' % expiration)
...     return D

The application may then request a discharge macaroon from this third party service by providing it with the necessary authentication function. For our example authentication service, we can generate a discharge macaroon as follows:

>>> D = generate_discharge_rpc(ident, 'jane.doe@example.org', "jane's password")

With the discharge macaroon in hand, we can provide both our original token, and the token for the new discharge macaroon as the auth parameter to HyperDex. When both tokens are provided together, the request is authorized, just as before:

>>> discharge_M = M.prepare_for_request(D)
>>> discharge_token = discharge_M.serialize()
>>> c.get('accounts', account, auth=[token, discharge_token])
{'name': 'John Smith', 'balance': 15}

One of the nice things about the macaroon structure is that any caveats added to discharge macaroons are also enforced by HyperDex. If we wait until the expiration time of the discharge macaroon has passed, the request will fail, just as it did before when the expiration was on the root macaroon:

>>> time.sleep(31)
>>> c.get('accounts', account, auth=[token, discharge_token])
Traceback (most recent call last):
HyperDexClientException: ... it is unauthorized [HYPERDEX_CLIENT_UNAUTHORIZED]

Contextual Confinement

The beauty of the macaroons construction is that it efficiently enables the principal making a request to impose extremely strict constraints on exactly when and how its credentials may be used by the intermediate services. A principal may constrain both what the request is allowed to access, and who is allowed to make the request. For example, the client-side application may use a user's password to construct a token that cannot possibly be used in any other context, such as: "This request is authorized to read the balance of John Smith's bank account ending in *879 between 8:01 and 8:02am on Friday November 21 when presented by a client using an Android 4.4.5 phone, connected via SSL, from IP, but only if the request also includes a discharge macaroon constructed using the 2-factor authentication code sent to John's phone number as the secret." No other authentication system can take such a broad statement, and concisely express and enforce it using computation that is efficient for a cellular phone to construct, and nearly impossible for an adversary to tear apart.

More abstractly, macaroons can be thought to restrict requests to individual cells in prototypical access-control matrix presented in every undergraduate level operating systems course. Each time a caveat is added to a macaroon, the resulting macaroon is authorized for a strict subset of the principals and objects that the input macaroon authorizes.

Efficiency Considerations


Secure and low overhead.

Macaroons are excellent for use in distributed systems, because they allow applications to enforce complex authorization constraints without requiring server-side modification. Applications can use existing infrastructure to generate discharge macaroons, and provide these macaroons to HyperDex. On the server-side, HyperDex uses local and fast cryptographic operations to verify that the macaroons contain a valid proof that the user is authorized to continue their request. Consequently, it is very easy to perform per-object authorization without expensive operations on the server-side fast path.

Further Reading

Share on Linkedin
Share on Reddit
comments powered by Disqus