What Can An Earthquake Teach Us About Database Design

Quake aftermath

There was a big earthquake in the Aegean Sea this morning. In the 6.7-7.1 range, close to the surface, in a spot that affects cities from Thassaloniki to Istanbul, Alexandroupolis to Izmir, this is the kind of event that has the potential to kill many people. With more than 15 million people shaken violently, there is only one natural human reaction: hit the closest earthquake reporting site. See where the quake was, and check whether anyone you care for could be affected. So this post isn't just technical, it's also quite personal: I wanted to know how my mother was doing.

But when I went to this otherwise-incredibly-well-done earthquake reporting site that I have bookmarked, called Global Incident Map, I was greeted by an empty world map and this explanation:

mysql: too many connections

Technical Description

Quake aftermath

Here's the technical description behind this disgrace and total failure.

MySQL supports only 151 simultaneous connections by default. The pages claim that you can go up to 10000 but only "if you have many gigabytes of RAM available and the workload from each is low or the response time target undemanding."

The maximum number of connections MySQL can support depends on the quality of the thread library on a given platform, the amount of RAM available, how much RAM is used for each connection, the workload from each connection, and the desired response time.

I can hear you say: "wait, why the heck is the number of connections at all related to the quality of my thread library? Why should I need a special relationship with a Taiwanese RAM manufacturer to support a few thousand sockets? Are we now living in the post-rapture world where nothing makes sense and we are continually punished? Did I miss the trumpet call?"

In theory, the MySQL docs claim that we can have as many as 2^32 connections. That's the same theory that says that we, and not just Brian Boitano, could nail a quadruple axle if we just worked at it hard enough. I hate to say it but not everything you put your mind to will actually happen. MySQL hitting 2^32 connections is one of them, for a good reason.

Quake aftermath

As Baron Schwartz explains, the problem here is that MySQL is architected to use a thread per connection. So every client requires a corresponding thread on the server side, along with an associated stack. The more connections you have, the more memory you need, the more your scheduler is burdened, the longer your clients have to wait to get serviced.

There are lots of clasically educated RDBMS people out there who have been scratching their heads since 2005, when people started defecting out of the RDBMS fold, wondering where they went wrong. There are tons of reasons why RDBMSs fell short, and why people decided to switch over to defunct first-generation NoSQL data stores, so many that I cannot catalog them in this quick post. But stuff like this certainly doesn't help. 150 simultaneous clients is absolutely nothing; I routinely get a few times that many simultaneous clients when any blog post of mine, relevant to a fairly small, technically savvy audience, gets any attention. An earthquake reporting site has to support more than 150 measly connections.

Neither I nor any other mortal can name you how many configuration parameters it takes to set up a proper MySQL or Oracle implementation. Not even an army of DBAs and a rolodex full of database optimization consultants can fully guarantee peace of mind. And when you need your database the most, when 15M people and their loved ones are wondering what happened, all they might get may very well be "too many connections."

I don't want to pick on MySQL: it is certainly one of the best tested databases out there, and to their credit, they documented this limitation. The true root cause here is on the application side: slapping components together without regard for their properties; testing only the "sunny day" scenarios that are easy to test, without concern for what happens under load or failure; and architecting services based on hearsay and fashion trends instead of a careful consideration of demands. As a result, our overall IT infrastructure remains fragile, and it's prone to fail when it is needed the most.

Share on Linkedin
Share on Reddit
comments powered by Disqus