When PR says No but Engineering says Yes

mongo broken February 07, 2013 at 02:37 PM Emin Gün Sirer

Looks like my post on how MongoDB is broken by design got a response from 10gen PR. It's an illustrative example of how corporate PR can let down regular developers.

First, the 10gen spokesperson seems to have read only the H2 elements on my original writeup -- he seems to not have read or understood the actual text that goes along with them. Do they have some local CSS applied that makes regular text invisible?

I had a few words to say about this TL;DR-culture at the end of my post. It's sad when developers can't be bothered to read things, but that's at some level, understandable -- we're all pressed for time (e.g. I'm writing this note in an airport lounge while traveling). But if you're the spokesperson at a company, and someone does a careful analysis of your flagship product and says "hey guys, looks like you goofed up," you probably should read the technical reasoning, no?

This is not a game of making correct-in-a-narrow-sense but misleading statements. The developer community will not have any sympathy for you if they trust your product and later find that their trust was misplaced. My connection is too flaky to check the proggit and HN discussion, but I suspect the dev community will not be kind.

Here's my quick pass:

Issue #1: I said that MongoDB v2.0 lies. My statement included the version number in it. I then described how the fixes to the defaults issued in v2.2 are insufficient. The response pretends that 5 years of brokenness never took place, and it does not address the concern that the fix is insufficient in any shape or form. With the old lying Mongo, a single client crash could lead to data loss. With the new much-improved Mongo, a single server crash can lead to data loss. Just because they get a confirmation from a single server does not mean that the data has been recorded anywhere except in the volatile memory of a single server. Even if it were recorded on disk, the fault-tolerance guarantee would not change at all (though the nature of the fault would change, from node failure to disk failure -- perhaps Mongo assumes that disks don't fail?). How many faults does it take to lose data when all you have is a single copy on a single host?

This is not what normal people mean by fault-tolerant.

Issue #2: 10gen misses the point:

Intuitively, waiting for an operation to complete on the server is

slower than not waiting for it.

There is some unavoidable waiting to be done to get the data to be committed. But getLastError incurs an additional latency and overhead for the client to communicate its desire to wait for that commit. And the response is clearly failing to account for the difference. It's as if there is no difference between the two. MongoDB is evidently building its systems as if networks have 0 latency, network bandwidth is a free commodity, and NICs have no overheads.

When making a pact with the devil, be sure to draw up a bullet-proof contract. For he may take your integrity and not give the "speed" you seek.

That's fine by me. I've gotten countless questions on "what makes HyperDex so fast? what's your secret sauce?" We've detailed some of the secret sauce, but people still find it hard to believe that HyperDex can be so fast (3000 ops/sec; for comparison, the carefully designed Cassandra code gets 1500 ops/sec) and so much faster than MongoDB (at 6 ops/sec) that the latter is unable to finish a benchmark. I hope this demonstrates the difference between keeping performance tradeoffs front and center throughout your design and implementation, and acting like physics don't apply to you.

Anyhow, if they're happy with their performance, who am I to complain? It's weird, though, to give up consistency and fault-tolerance for performance, but then to fail at achieving performance as well. If you're going to sell your soul and your data integrity to the devil in exchange for speed, well, make sure he delivers, or else it'll look really bad.

Issue #3: 10gen confirms that getLastError does not work when pipelined. Anyone can see through the attempt to spin the bug as a feature.

I agree with them that not every application needs every write confirmed. I covered this in my article on when data is worthless. It's a slippery slope, one that I suspect most developers will find difficult to navigate.

I note with concern that the pattern recommended by 10gen here is broken. If you pipeline 10000 inserts, call getLastError, and see success, you cannot count on 9999 of those inserts as having been committed without errors. It's a sharded data store. The previous operations may have hashed onto any set of servers, and some of those servers may well fail or have failed. The success message you see about the last write implies nothing about the success of the preceding requests.

Issue #4: MongoDB forums and blog posts are full of messages about how one can use getLastError to check on the outcome of the last operation. I pointed out that if you follow their advice and write code that looks like this:

db.insert(...);
DBObject err = db.getLastError();

then your code may very well see the result of a completely different operation that was last performed on that connection by another thread.

Here's the MongoDB documentation that confirms this (skip the first sentence) [1].

The Java MongoDB driver is thread safe. If you are using in a web

serving environment, for example, you should create a single Mongo

instance, and you can use it in every request. The Mongo object

maintains an internal pool of connections to the database (default

pool size of 10). For every request to the DB (find, insert, etc)

the Java thread will obtain a connection from the pool, execute the

operation, and release the connection. This means the connection

(socket) used may be different each time.

Let's get back to that first sentence and use this opportunity to talk about "thread-safe" vs "system-safe." The Mongo driver may indeed be thread-safe, in that it uses locks correctly internally and maintains its own invariants for its own correct operation. But the programs that use it are a different matter. I can build a component that is internally thread-safe, and yet its API makes it difficult or impossible for threaded programs to maintain their own correctness invariants.

Until recently, MongoDB did not talk about requestStart() and requestDone() in any context except when talking about how to ensure a very weak consistency requirement. Namely, if you don't use this pair of operations, then a write to the database followed by a read from the database, by the same client, can return old values. So, I write 42 for key k with a WriteConcern.SAFE, read key k, and get some other number, because the Mongo driver can, by default, very well send the first request to one node over one connection, and the second one to another, over another connection. So requestStart() and requestDone() were billed as a mechanism to avoid that scenario; I saw no mention that they were required for correctness in multithreaded settings. I bet there is plenty of multithreaded code that does not follow that pattern. Such code is broken; if you're a Mongo user, it'd be a good idea to check if you ever use getLastError without a bracketing requestStart() and Done().

Issue #5: What a non-committal answer. What's the Mongo setting for "I don't want you to lose my data?" I'll even specify a very benign fault model: my cluster can have at most one fault at a time. I've paid my tithing and sacrificed all the right kinds of animals to various deities, so I've made sure that a second fault will not occur before my devops team fixes the first fault. What Mongo setting will ensure that I don't lose data?

I already described how all but one of the SAFE settings that Mongo provides are broken, and described how they cannot withstand a single fault. True, I did not yet describe how REPLICA_SAFE is broken. Yes, yes, I know, how dare I write a huge post cataloging a number of bona fide errors that half the Mongo fans, including the very fellow whose job is to respond to the post, did not read beyond the H2 tags, and leave this part to a future post? It's partly because I actually have a full time job and life, but it's mainly because I wanted Mongo to come out and say "REPLICA_SAFE provides a fault-tolerance guarantee." They are steadfastly refusing. Read this fellow's response carefully:

WriteConcerns provide a flexible toolset for controlling the

durability of write operations applied to the database. You can choose

the level of durability you want for individual operations, balanced

with the performance of those operations. With the power to specify

exactly what you want comes the responsibility to understand exactly

what it is you want out of the database.

This says "we give you what you get." I guess if the system loses data, it's always the developer's fault for not understanding the internal workings of Mongo.

[1]	MongoDB docs on Java Concurrency.

When PR says No but Engineering says Yes

Emin Gün Sirer

Subscribe

Projects

Recent Posts

Popular

Blog Tags