Innodb | SmugMug's Don MacAskill

MySQL 5.5.4 looks awesome.

April 15, 2010 Don MacAskill 8 comments

Been at the MySQL conference the last few days, and I have to say, I’m really blown away by MySQL 5.5.4‘s improvements. Last year I keynoted and I begged Oracle on stage to realize that MySQL and InnoDB under one roof represented opportunity. It’s clear they heard the community – this is some serious progress, and right when we needed it.

Jeremy Zawodny’s blog post covers most of the stuff I’m really excited about, and there are some great detailed technical slides here and here, but I wanted to go into a little more detail on one important improvment: We’ve been plagued by MySQL’s undo slot limits for an awfully long time. Basically, you could have 512 INSERT transactions and 512 UPDATE transactions running at once, for a grand total of 1024. If you use INSERT … ON DUPLICATE KEY UPDATE, though, it takes two of those spots, meaning you get 512 concurrent transactions. On modern hardware, it’s trivially easy to hit this limit.

I’ve had an Enterprise support ticket open for years on the issue, there’s been a MySQL bug for a long time, and there was basically no movement. In fact, I’d gotten so frustrated about this issue, I’d basically decided this year was our last year of Enterprise MySQL support. It was one of the sole reasons we paid for support for the last few years – the promise that a fix was just around the corner. I felt good about voting with my dollars, and contributing back to a core technology we depend on, but enough was enough.

Lo and behold, it’s fixed! You can now have a whopping 128K transactions in flight. Best of all, it’s far more performant than it used to be! And craziest of all? If you run 5.5.4 on a database, then roll back to some older release, the change still takes effect. Backwards bug and performance fixing – that’s a new one on me.

THANK YOU ORACLE!

Shameless plug – we’re hiring. And it’s a blast.

Categories: datacenter, MySQL Tags: 5.5.4, bug, Innodb, MySQL, oracle

Great things afoot in the MySQL community

December 23, 2008 Don MacAskill 37 comments

tl;dr: The MySQL community rocks. Percona, XtraDB, Drizzle, SSD storage, InnoDB IO scalability challenges.

For anyone who lives and dies by MySQL and InnoDB, things are finally starting to heat up and get interesting. I’ve been banging the “MySQL/InnoDB scales poorly” drums for years now, and despite having paid Enterprise licenses, I haven’t been able to get anywhere. I was pretty excited when Sun bought MySQL since their future is intrinsically tied to concurrency, but things have been pretty slow going over there this year.

But the community has finally taken up arms and is fighting the good fight. It’s (finally!) a great time to be a MySQL user because there’s been lots of recent progress. Here’re some of my favorites (and highlights of work left to do):

PERCONA

I can’t sing Percona’s praises enough. They’re probably the most knowledgeable MySQL experts out there (possibly even including Sun). Absolutely the best bang for the buck in terms of MySQL service and support – better than MySQL’s own offering. (If I had to guess why that is, I’d bet that MySQL/Sun don’t want to step on Oracle’s toes by fixing InnoDB – but >99% of what we need is related to InnoDB. Percona has no such tip-toeing limitations.) Let me quickly count the ways they’ve helped me in the last few months:

They knew of a super obscure configuration setting “back_log“. Have you ever heard of it? I hadn’t. But we started seeing latency on MySQL connections (up to *3 seconds*!) on systems that hadn’t changed recently (exactly 3 seconds sounded awfully suspicious, and sure enough, it was TCP retries). After going through every single kernel, network, and MySQL tuning parameter I know (and I know a lot), I finally called Percona. They dug in, investigated the system, and unearthed ‘back_log’ within an hour or two. Popped that into my configuration and boom, everything was fine again. Whew!
We have servers that easily exceed InnoDB’s transaction limits. Did you know InnoDB has a concurrent transaction limit of 1024? (Technically, 1024 INSERTs and 1024 UPDATEs. But INSERT … ON DUPLICATE KEY UPDATE manages to chew up one of each). I know all about it – I’ve had bugs open with MySQL Enterprise for more than 2 years on the issue. What’s more, these are low-end systems – 4 cores, 16GB of RAM – and they’re no-where near CPU or IO bound. It took MySQL months to figure out what the problem was (years, really, to figure out all the final details like the different undo logs for INSERT vs UPDATE). Their final answer? It’ll be fixed in MySQL 6. 😦 Note that 5.1 *just* went GA after years and years. On the other hand, it took Percona one weekend to diagnose the problem, and 13 days to have a preliminary patch ready to extend it to 4072 undo slots. Talk about progress! (And yes, we want Percona to release the patch to the world)
Solving the CPU scaling problems. These have been plaguing us for years (we have had some older four-socket systems for awhile … now with quad-core, it’s even worse), and thanks to Google and Percona, this problem is well on its way to being solved. We’re sponsoring this work and can’t wait to see what happens next.
XtraDB. This is the biggy. So big it deserves its own heading….

XTRADB

Oracle’s done a terrible job of supporting the community with InnoDB. The conspiracy theorists can all say “I told you so! Oracle bought them to halt MySQL progress” now – history supports them. Which is a shame – Heikki is a great guy and has done amazing work with InnoDB, but the fact remains that it wasn’t moving forward. The InnoDB plugin release was disappointing, to say the least. It addressed none of the CPU or IO scalability issues the community has been crying about for years.

Luckily, Percona finally did what everyone else has been too afraid to do – they forked InnoDB. XtraDB is their storage engine, forked from InnoDB (and then turbocharged!). We’re not running it in production yet, but we are running all of the patches that went into XtraDB and I can tell you they’re great. We’re sponsoring more XtraDB development (and yes, we made sure Percona will be contributing anything they build for us back to the community) with Percona, and I’m sure that’ll continue.

DRIZZLE

I’ve already blogged a bit about Drizzle, but it sure looks like Drizzle + XtraDB might be a match made in heaven. Drizzle can be though of as a MySQL engine re-write with an eye towards web workloads and performance, rather than features. MySQL 4.1, 5.0, and 5.1 added a lot of features that bloated the code without offering anything really useful to web-oriented workloads like ours, so the Drizzle team is ripping all that stuff back out and rethinking the approaches to the things that are being left in. Very exciting.

SSD STORAGE

The advent of “cheap enough” super-fast SSD storage is finally upon us. I’ve got Sun S7410 storage appliances in production and they’re blazingly fast. I have a very thorough review coming, but the short version is that even with NFS latencies, we’re able to do obscene write workloads to these boxes (let alone reads). 10000+ write IOPS to 10TB of mirrored, crazy durable (thanks ZFS!) storage is a dream come true. Once you mix in snapshots, clones, replication, and Analytics – well, it just doesn’t get much better than this.

(Don’t get sticker shock looking at the web pricing – no-one pays anything even remotely like that. Sign up for Startup Essentials if you can, or talk to your Sun sales rep if you can’t, and you can get them much cheaper. I nearly had a heart attack myself until I got “real” pricing. Tell them I sent you – enough Sun people read this blog, it might just help 🙂 ).

STILL NEEDED…

So, all in all, there’s been an awful lot of progress this year, which is great. CPUs are finally scaling under InnoDB, and we finally have storage that isn’t bounded by physical rotation and mechanical arms. Unfortunately, great CPU scaling plus amazing IO capabilities isn’t something InnoDB digests very well. As is common in complicated systems, once you fix one bottleneck, another one elsewhere in the system crops up. This time, it’s IOPS. It was eerie reading Mark Callaghan’s post about this last night – I’d come to the exact same conclusions (from an Operations point of view rather than code-level) just yesterday.

Bottom line: Despite having ample CPU and ample IO, InnoDB isn’t capable of using the IO provided. You can bet we’ll be working with Percona, Google and Sun (read: sitting back and admiring their brilliant work while writing the occasional check and providing production workload information) to look into fixing this.

In the meantime, we’re back to the old standbys: replication and data partitioning. Yes, we’re stacking lots of MySQL instances on each S7410 to maximize both our IOPS and our budget. Fun stuff – more on that later. 🙂

UPDATE: Just occurred to me that there are plenty of *new* readers to my blog who haven’t heard me praise Google and their patches before. Mark Callaghan’s team over at Google definitely deserves a shout-out – they’ve really been a catalyst for much of this work along with Percona.

Categories: datacenter, MySQL Tags: concurrency, drizzle, flash, google, Innodb, mark callaghan, MySQL, percona, performance, s7410, scalability, ssd, sun, transactions, xtradb, zfs

MySQL and the Linux swap problem

May 1, 2008 Don MacAskill 44 comments

Ever since Peter over at Percona wrote about MySQL and swap, I’ve been meaning to write this post. But after I saw Dathan Pattishall’s post on the subject, I knew I’d better actually do it. 🙂

There’s a nasty problem with Linux 2.6 even when you have a ton of RAM. No matter what you do, including setting /proc/sys/vm/swappiness = 0, your OS is going to prefer swapping stuff out rather than freeing up system cache. On a single-use machine, where the application is better at utilizing RAM than the system is, this is incredibly stupid. Our MySQL boxes are a perfect example – they run only MySQL and we want InnoDB to have a lot of RAM (32-64GB … and we’re testing 128GB).

You can’t just not have any swap partitions, though, or kswapd will literally dominate one of your CPU cores doing who-knows-what. But you can’t have it swapping to disk, or your performance goes into the toilet. So what to do?

Our solution is to make swap partitions out of RAM disks. Yes, I realize how insane that sounds, but the Linux kernel’s insanity drove us to it. Best part? It works. Here’s how:

mkdir /mnt/ram0 mkfs.ext3 -m 0 /dev/ram0 mount /dev/ram0 /mnt/ram0 dd bs=1024 count=14634 if=/dev/zero of=/mnt/ram0/swapfile mkswap /mnt/ram0/swapfile swapon /mnt/ram0/swapfile

That’ll give you a 14MB swap partition that’s actually in RAM, so it’s super-fast. This assumes your kernel is creating 16MB ramdisk partitions, but you can adjust your kernel paramenters and/or the ‘dd’ line above to suit whatever size you want.

We’ve found that anywhere from 20MB-40MB tends to be enough (so use /dev/ram1, /dev/ram2, etc), depending on load of the box. kswapd no longer uses any noticeable CPU, there’s always a few MB of free “swap”, and life is back in the fast lane. Just add those lines to your relevant startup file, like /etc/rc.d/rc.local, and it’ll persist after reboots.

Some Linux purists will probably hate this approach, others may have more efficient ways of achieving the same thing, but this works for us. Give it a shot. 🙂

Oh, and I hope it goes without saying, but make *darn* sure you know what you’re running on your box and what the maximum RAM footprint will be before you try running with only 20-40MB of swap. We’ve never OOMed (Out-Of-Memory) a production MySQL box – but that’s because we’re careful.

UPDATE: See what happens when I wait to blog? I forget that I read another related post over on Kevin Burton’s blog. Like Kevin, we’re using O_DIRECT, but unlike Kevin, this doesn’t solve the problem for us. Linux still swaps. We use the latest 2.6.18-53.1.14.el5 kernel from CentOS 5, btw. (Sorry, had posted 2.6.9 because I was dumb. We’re fully patched)

Categories: datacenter, MySQL Tags: Innodb, Linux, memory, MySQL, OOM, percona, RAM, swap

SmugMug's Don MacAskill

Archive

MySQL 5.5.4 looks awesome.

Great things afoot in the MySQL community

MySQL and the Linux swap problem

What I’m Doing:

Follow Blog via Email

SmugMug

Tags:

Archives

SmugMug's Don MacAskill

Archive

MySQL 5.5.4 looks awesome.

Share this:

Great things afoot in the MySQL community

Share this:

MySQL and the Linux swap problem

Share this:

What I’m Doing:

Follow Blog via Email

SmugMug

Tags:

Archives