Technological hindsight; we don’t has it

I just read a write-up on the latest-and-greatest database to come out in the new db wars. It’s all about how MemSQL can do 80k queries per second on a set that MySQL can only do 3.5k.

First I want to say bullshit. Anyone who has ever done db benchmarking knows that given enough time to tweak, and enough knowledge about the innards of your database d’jour, knows that you can work out a way to make one beat the other. It’s the big “MySQL is faster than Oracle!” – “Nu, uh!” fight that we had back in the ’90s.

But secondly, a decade ago I worked at a (now defunct) advertising company called Active Response Group. We had a simple MySQL server setup with one master and 2 slaves. All the writes to master, most of the reads from the slave (we had a bit of code that we wrote that checked how far behind the slaves were and picked the server to read from depending on if the query needed to be “real-time-ish” or not.) Nothing fancy. No sharding. Occasionaly we would have to upgrade a slave to be the master, and that was exciting (and usually happened at 2am), but other than that, it was a fairly normal setup.

The database was about 1TB. On a single machine. We did over 35K queries per second reads, and over 20k queries per second wirtes.

Not quite 80k, but this was a fucking decade ago! given Moore’s Law that means that the hardware that we had was roughly 20x slower than what we have now. Given the same setup on modern hardware, with similar use requirements and knowledge of our data, we should be able to push around 700,000 queries per second for reads!

WHY THE FUCK IS 80K NEWS?!

At the (albiet very speculative) capacity that I said we should be able to pull off, the bottle-neck would be the pipe itself. We would saturate the network before we stopped serving up queries or queueing connections.

I think the tech world has a lack of hindsight. At the time that we were doing 35k qps, we weren’t even the biggest player in the game. We had peer companies in the same industry doing easily double our traffic and similarly double our database load. Why have we forgotten? Why do these jokers that said, “Well make it faster by ignoring the fact that memory is volatile; I bet people don’t like their data that much anyway…” get such big press for doing worse than what we did just a few years ago?

I believe it’s because of 2 things: 1) Engineers have a strong case of “not built here” syndrome, and 2) The NoSQL hype lived up to exactly one person’s expectations and was just a giant headache for everyone else. When someone comes along and says “remember how easy MySQL is? We can make it FAST!” it’s pretty tempting. Put it together, and what have you got? bibbity bobbity boo… Erm… this horse pucky.

Now, hacking on the MySQL codebase is no easy task. I’ve done it before (part of getting the speed out 10 years ago that we did.) Hard-core memory management is not for the feint of heart (I would argue that they probably shouldn’t be , but I haven’t seen their code — it’s possible something magic is going on.) They seem like smart guys doing smart-guy things. I just don’t think it’s such hot-shit as to be called “The fastest database on the planet.”

The lesson has be learned, found, taught, re-taght, and re-learned by every large-scale developer on the planet many times over:

The fastest database on the planet is the one correctly chosen, tuned, and queried against for your data. Know your data and the database will be fast. Just shoving more shit in memory might give you enough of a short-term gain to get your Christmas bonus, but fuck you if you’re that guy.

Oh, and one more thing: If’t it’s all in memory, it’s not ACID .