LevelDB is a popular data back end that was originally developed by Google for use as a stand alone key-value store library. Since Google originally open sourced LevelDB, it's been widely adopted and modified by others, including by Facebook, Basho, and us at HyperDex. While LevelDB provides a solid foundation for building data-intensive applications, there are many possible ways in which its performance can be improved for a variety of workloads.
In this article, we look at how some recent improvements to HyperLevelDB, the HyperDex fork of LevelDB, improve concurrency for multiple writers.
The various LevelDB forks provide decent performance with a single writer thread. The graph to the right shows the performance of LevelDB with a single writer that inserts 128-byte objects as fast as it can. HyperDex's and Basho's forks exhibit significantly higher throughput than LevelDB or RocksDB, reaching about 275K operations per second in this single thread. Google's LevelDB and Facebook's RocksDB achieve more modest throughput.
If you were building an application on LevelDB, you might be tempted to scale your application by adding additional threads. After all, modern servers, even virtual servers, have multiple CPU cores available. Intuitively, we would expect that adding an additional writer thread would increase the application's total throughput, but this is not the case. The graph to the left shows the aggregate throughput of two writer threads inserting the same data used in the first benchmark.
The first three LevelDB variants actually show a decrease in overall throughput, despite having twice the computing power available. Until recently, the HyperLevelDB benchmark would have showed a similar decrease in throughput, but the graph shows, our recent optimizations increase throughput to over 350K operations per second with the second thread.
As concurrency increases with additional threads, other LevelDB variants continue to see their performance degrade, while HyperLevelDB performance increases with each additional thread. The graph to the right shows the throughput when running four threads on our quad-core system. HyperLevelDB's throughput is 2-4 times higher than the other variants.
HyperLevelDB's performance stems from the following changes:
HyperLevelDB is free and open-source. It provides an identical API to LevelDB, and maintains compatibility with the on-disk format. If you are interested in improving the performance of an application that uses LevelDB, you might want to try dropping in HyperLevelDB.
Some additional resources that may be of interest: