Back That NoSQL Up

Backups

It's always a good idea to have a good backup.

Every data store needs a good backup mechanism. Backups provide disaster recovery and protect against user error. Replication mechanisms provided by modern data stores can provide some of this same protection, but are no substitute for a proper backup mechanism.

We recently committed support for backups in HyperDex and believe that the resulting tools provide the best backup experience available in any open source or commercial data store. HyperDex backups enable administrators to backup the on-disk state of each HyperDex server in its native form. Backups are fast, and naturally support incremental backup and restore.

Let's illustrate how one can use these tools to easily take consistent backups, and talk about how they compare to other NoSQL backup solutions.

Backups at a glance

Taking a backup of the entire HyperDex cluster is as easy as issuing the command:

$ hyperdex backup-manager --backup-dir /path/to/backups

Behind the scenes, this command will:

  1. Put the cluster into read-only mode

  2. Wait for ongoing writes to quiesce

  3. Take a backup of the coordinator

  4. Take a backup of each individual daemon

  5. Put the cluster back into read-write mode

  6. Copy the coordinator state and each of the daemon's state to /path/to/backups/YYY-mm-ddTHH:MM:SS for the current date and time, exploiting data available in previous backups if possible.

Suppose the unthinkable happens (a datacenter outage, cooling failure, flood, act of God, or more likely, some human error that causes data loss, in which case some manager has to trot out the well-known "we had an awesome network but a rodent ate through the cables, fried itself and your data" defense, which is why every good manager keeps a fridge full of burnt rodents in her datacenter, but we digress) and we need to restore from backup on a different set of machines.

Restoring the cluster is as simple as copying each directory within /path/to/backups/YYY-mm-ddTHH:MM:SS to a different daemon server, and then launching a new coordinator from the backup:

$ hyperdex coordinator --restore coordinator.bin

You can then re-launch each daemon using the individual data directories within the backup. The daemons can be re-launched on entirely different servers, with different IP addresses than they were originally bound.

All freshly-restored clusters come online in read-only mode. Once all daemons have come online, you can make the cluster fully operational by executing:

$ hyperdex set-read-write

That's everything required to do a full backup and restore.

Backup Efficiency

Backups

HyperDex's backups are extremely efficient. It's possible to take a snapshot of terabytes of data in sub-second time. The secret sauce behind HyperDex's backups lies in the structure of its on-disk storage.

HyperDex uses HyperLevelDB as its storage backend, which, in turn, constructs an LSM-tree on disk. The majority of data stored within HyperLevelDB is stored within immutable .sst files. Once written, these files are never overwritten. The backup feature within HyperDex can extract a snapshot of these files via an instantaneous hard link in the filesystem. This state, once snapshotted so efficiently, can then be transferred elsewhere at will.

At the cluster level, backups pause writes for a short period of time while the HyperDex daemons take their individual snapshots. Because individual snapshots are efficient, the pause generally takes less than a second.

Finally, the backup tool ensures that the cluster is restored to full operation before transferring any data over the network.

Advanced Backup Techniques

Backup solutions often require customization; some setups use network attached storage and back up to a centralized location, others will demand custom control over where and how the snapshots are transferred. Let's walk through a more detailed example where the snapshots are manually managed.

Every custom backup solution will start by initiating a snapshot operation within the HyperDex cluster. For example, this command will create a snapshot named "my-backup":

$ hyperdex backup my-backup
11419051871340077972 127.0.0.1 /home/rescrv/src/HyperDex/d1/backup-my-backup
13855129396075833686 127.0.0.1 /home/rescrv/src/HyperDex/d2/backup-my-backup
14576759822280270796 127.0.0.1 /home/rescrv/src/HyperDex/d3/backup-my-backup

When this command succeeds, it will exit zero, and output a list of the servers that hold individual snapshots. For instance, we can see that the server identified by "11419051871340077972" on 127.0.0.1 backed up its state to /home/rescrv/src/HyperDex/d1/backup-my-backup.

We can then transfer that state to a directory of our choosing with rsync:

$ rsync -a --delete -- 127.0.0.1:/home/rescrv/src/HyperDex/d1/backup-my-backup/ 11419051871340077972

This command will create a directory named "11419051871340077972" and transfer all of the server's state to this directory. This directory is suitable for passing to the --data= parameter of HyperDex's daemon.

The backup command above also creates a file in the current directory called my-backup.coordinator.bin. This file contains the coordinator's state, suitable for using to bootstrap a new coordinator cluster as shown above.

The backup-manager command shown above executes the backup command, and the appropriate rsync commands to transfer state into the backup directory. Of course, this other possible schemes include:

  • Using filesystem-level snapshots to save the backup directories. Daemons may then be launched from the snapshots in the future.

  • Having the daemons directly rsync their state to an off-site storage location. This avoids any resource constraints that may limit the effectiveness of the backup-manager.

  • Restore multiple instances of a cluster simultaneously to experiment with your production data. For example, before deploying a new application to production, backup the cluster, and restore the backup in parallel to the production cluster. Then, deploy the new application against the parallel cluster before deploying to production.

Incremental Backups

The backup-manager command demonstrated above supports incremental backups, wherein successive calls will only transfer state that has been modified since the last backup. Each backup appears as a full backup of the cluster and shares storage resources with previous backups. Consequently, the backup manager will store several complete backups of the cluster without requiring the storage space of several complete backups.

The implementation leverages the --link-dest= option to rsync to avoid making redundant copies of the data. When provided this option, rsync will look for files in the specified directory. If the file in the source directory is also present, unchanged, in the link-dest directory, then it will be hard-linked into place and not copied over the network.

Assuming an existing backup exists in the directory 11419051871340077972-1, we can create an incremental backup in 11419051871340077972-2 by issuing:

$ rsync -a --delete --link-dest=11419051871340077972-1 -- \
    127.0.0.1:/home/rescrv/src/HyperDex/d1/backup-my-backup/ \
    11419051871340077972-2

Consistent versus Inconsistent Backups

In a sharded data store, the simple but flawed way to make backups is to take a snapshot at each of the shards independently. Needless to say, this is precisely what a number of other systems do. The problem with this approach is that backups taken without coordination among the shard servers are not going to take a consistent cut through the data. A portion of the data in the backup might come from a snapshot at time T0, while others come from T1, and yet others from different points in time. Depending on the application, these kinds of uncoordinated backups can lead to inconsistencies in the database. Restoring from a backup might take the system to a state that actually never existed. And that defeats the entire point of a backup.

In contrast, HyperDex backups are coordinated. The network pauses for slightly less than a second, the operation queues are drained, pending operations are completed, and therefore the snapshots capture a consistent, clean view of the data. A restore will take the system back to a precise point in time.

Comparison to other NoSQL backup solutions

Backups

The flexibility enabled by HyperDex's backup and backup-manager tools sets a new industry standard for NoSQL backup solutions. Other systems require support from expensive filesystem- or operating system-level primitives, a complete shutdown, or are inconsistent.

Cassandra:

Cassandra's backups are inconsistent, as they are taken at each server independently without coordination. Further, "Restoring from snapshots and incremental backups temporarily causes intensive CPU and I/O activity on the node being restored."

MongoDB:

MongoDB provides two backup strategies. The first strategy copies the data on backup, and re-inserts it on restore. This approach introduces high overhead because it copies the entire data set without opportunity for incremental backup.

The second approach is to use filesystem-provided snapshots to quickly backup the data of a mongod instance. This approach requires operating system support and will produce larger backup sizes.

Riak:

Riak backups are inconsistent, as they are taken at each server independently without coordination, and require care when migrating between IP addresses. Further, Riak requires that each server be shut down before backing up LevelDB-powered backends.

The HyperDex backup/restore process is strongly consistent, doesn't require shutting down servers, and enables incremental backup support. Further, the process is quite efficient; it completes quickly, and does not consume CPU or I/O for extended periods of time.

Wrapping up

HyperDex's backup tools enable administrators to take periodic backups of a live HyperDex cluster. The new backup tools are currently available in the open source git repository and will be included in the upcoming HyperDex 1.1 release.



comments powered by Disqus