Concurrency Improvements in HyperLevelDB

hyperleveldb leveldb June 18, 2014 at 11:00 AM Robert Escriva and Emin Gün Sirer

LevelDB is a popular data back end that was originally developed by Google for use as a stand alone key-value store library. Since Google originally open sourced LevelDB, it's been widely adopted and modified by others, including by Facebook, Basho, and us at HyperDex. While LevelDB provides a solid foundation for building data-intensive applications, there are many possible ways in which its performance can be improved for a variety of workloads.

In this article, we look at how some recent improvements to HyperLevelDB, the HyperDex fork of LevelDB, improve concurrency for multiple writers.

Improving Concurrency

LevelDB with one thread writing 128B values.

The various LevelDB forks provide decent performance with a single writer thread. The graph to the right shows the performance of LevelDB with a single writer that inserts 128-byte objects as fast as it can. HyperDex's and Basho's forks exhibit significantly higher throughput than LevelDB or RocksDB, reaching about 275K operations per second in this single thread. Google's LevelDB and Facebook's RocksDB achieve more modest throughput.

If you were building an application on LevelDB, you might be tempted to scale your application by adding additional threads. After all, modern servers, even virtual servers, have multiple CPU cores available. Intuitively, we would expect that adding an additional writer thread would increase the application's total throughput, but this is not the case. The graph to the left shows the aggregate throughput of two writer threads inserting the same data used in the first benchmark.

LevelDB with two threads writing 128B values.

The first three LevelDB variants actually show a decrease in overall throughput, despite having twice the computing power available. Until recently, the HyperLevelDB benchmark would have showed a similar decrease in throughput, but the graph shows, our recent optimizations increase throughput to over 350K operations per second with the second thread.

As concurrency increases with additional threads, other LevelDB variants continue to see their performance degrade, while HyperLevelDB performance increases with each additional thread. The graph to the right shows the throughput when running four threads on our quad-core system. HyperLevelDB's throughput is 2-4 times higher than the other variants.

LevelDB with four threads concurrently writing 128B values.

HyperLevelDB's performance stems from the following changes:

Reduce the time locks are held: Where possible, we are reducing the time period for which locks are held. By shortening the time threads spend holding locks, we significantly reduce the likelihood that another thread tries to acquire the lock in the same time frame.
Use fine-grain locking: LevelDB's design has a single mutex that protects all internal state. Each thread acquires this mutex before modifying any internal state, and only releases it when done. In HyperLevelDB, we have switched to finer-granularity locks to permit more threads to concurrently manipulate the internal state at the same time without any loss of safety.
Employ lock-free data structures: We've modified several internal structures to be lock free, eliminating the need for blocking-based synchronization such as mutexes.

Final Thoughts

HyperLevelDB is free and open-source. It provides an identical API to LevelDB, and maintains compatibility with the on-disk format. If you are interested in improving the performance of an application that uses LevelDB, you might want to try dropping in HyperLevelDB.

Some additional resources that may be of interest:

level-hyper is the Node.js HyperLevelDB wrapper. It enables you to use HyperLevelDB from within Node.js, and take advantage of many of our optimizations. Internally, level-hyper uses a thread pool to issue writes to the database and therefore takes advantage of the concurrency improvements described above, even for single-threaded Node apps.
HyperDex.org is the home of HyperDex, a distributed key-value and document store built on top of HyperLevelDB. Many of our changes to HyperLevelDB are driven by HyperDex.
If you like the improvements we're making to LevelDB, help us fund further HyperLevelDB development. We provide support contracts for HyperLevelDB. If you use HyperLevelDB in your application, consider helping support its further development.

Concurrency Improvements in HyperLevelDB

Improving Concurrency

Final Thoughts

Robert Escriva

Emin Gün Sirer

Subscribe

Projects

Recent Posts

Popular

Blog Tags