Home > datacenter, MySQL > Great things afoot in the MySQL community

Great things afoot in the MySQL community

December 23, 2008

tl;dr: The MySQL community rocks. Percona, XtraDB, Drizzle, SSD storage, InnoDB IO scalability challenges.

For anyone who lives and dies by MySQL and InnoDB, things are finally starting to heat up and get interesting. I’ve been banging the “MySQL/InnoDB scales poorly” drums for years now, and despite having paid Enterprise licenses, I haven’t been able to get anywhere. I was pretty excited when Sun bought MySQL since their future is intrinsically tied to concurrency, but things have been pretty slow going over there this year.

But the community has finally taken up arms and is fighting the good fight. It’s (finally!) a great time to be a MySQL user because there’s been lots of recent progress. Here’re some of my favorites (and highlights of work left to do):

PERCONA

I can’t sing Percona’s praises enough. They’re probably the most knowledgeable MySQL experts out there (possibly even including Sun). Absolutely the best bang for the buck in terms of MySQL service and support – better than MySQL’s own offering. (If I had to guess why that is, I’d bet that MySQL/Sun don’t want to step on Oracle’s toes by fixing InnoDB – but >99% of what we need is related to InnoDB. Percona has no such tip-toeing limitations.) Let me quickly count the ways they’ve helped me in the last few months:

  • They knew of a super obscure configuration setting “back_log“. Have you ever heard of it? I hadn’t. But we started seeing latency on MySQL connections (up to *3 seconds*!) on systems that hadn’t changed recently (exactly 3 seconds sounded awfully suspicious, and sure enough, it was TCP retries). After going through every single kernel, network, and MySQL tuning parameter I know (and I know a lot), I finally called Percona. They dug in, investigated the system, and unearthed ‘back_log’ within an hour or two. Popped that into my configuration and boom, everything was fine again. Whew!
  • We have servers that easily exceed InnoDB’s transaction limits. Did you know InnoDB has a concurrent transaction limit of 1024? (Technically, 1024 INSERTs and 1024 UPDATEs. But INSERT … ON DUPLICATE KEY UPDATE manages to chew up one of each). I know all about it – I’ve had bugs open with MySQL Enterprise for more than 2 years on the issue. What’s more, these are low-end systems – 4 cores, 16GB of RAM – and they’re no-where near CPU or IO bound. It took MySQL months to figure out what the problem was (years, really, to figure out all the final details like the different undo logs for INSERT vs UPDATE). Their final answer? It’ll be fixed in MySQL 6. :( Note that 5.1 *just* went GA after years and years. On the other hand, it took Percona one weekend to diagnose the problem, and 13 days to have a preliminary patch ready to extend it to 4072 undo slots. Talk about progress! (And yes, we want Percona to release the patch to the world)
  • Solving the CPU scaling problems. These have been plaguing us for years (we have had some older four-socket systems for awhile … now with quad-core, it’s even worse), and thanks to Google and Percona, this problem is well on its way to being solved. We’re sponsoring this work and can’t wait to see what happens next.
  • XtraDB. This is the biggy. So big it deserves its own heading….

XTRADB

Oracle’s done a terrible job of supporting the community with InnoDB. The conspiracy theorists can all say “I told you so! Oracle bought them to halt MySQL progress” now – history supports them. Which is a shame – Heikki is a great guy and has done amazing work with InnoDB, but the fact remains that it wasn’t moving forward. The InnoDB plugin release was disappointing, to say the least. It addressed none of the CPU or IO scalability issues the community has been crying about for years.

Luckily, Percona finally did what everyone else has been too afraid to do – they forked InnoDB. XtraDB is their storage engine, forked from InnoDB (and then turbocharged!). We’re not running it in production yet, but we are running all of the patches that went into XtraDB and I can tell you they’re great. We’re sponsoring more XtraDB development (and yes, we made sure Percona will be contributing anything they build for us back to the community) with Percona, and I’m sure that’ll continue.

DRIZZLE

I’ve already blogged a bit about Drizzle, but it sure looks like Drizzle + XtraDB might be a match made in heaven. Drizzle can be though of as a MySQL engine re-write with an eye towards web workloads and performance, rather than features. MySQL 4.1, 5.0, and 5.1 added a lot of features that bloated the code without offering anything really useful to web-oriented workloads like ours, so the Drizzle team is ripping all that stuff back out and rethinking the approaches to the things that are being left in. Very exciting.

SSD STORAGE

The advent of “cheap enough” super-fast SSD storage is finally upon us. I’ve got Sun S7410 storage appliances in production and they’re blazingly fast. I have a very thorough review coming, but the short version is that even with NFS latencies, we’re able to do obscene write workloads to these boxes (let alone reads). 10000+ write IOPS to 10TB of mirrored, crazy durable (thanks ZFS!) storage is a dream come true. Once you mix in snapshots, clones, replication, and Analytics – well, it just doesn’t get much better than this.

(Don’t get sticker shock looking at the web pricing – no-one pays anything even remotely like that. Sign up for Startup Essentials if you can, or talk to your Sun sales rep if you can’t, and you can get them much cheaper. I nearly had a heart attack myself until I got “real” pricing. Tell them I sent you – enough Sun people read this blog, it might just help :) ).

STILL NEEDED…

So, all in all, there’s been an awful lot of progress this year, which is great. CPUs are finally scaling under InnoDB, and we finally have storage that isn’t bounded by physical rotation and mechanical arms. Unfortunately, great CPU scaling plus amazing IO capabilities isn’t something InnoDB digests very well. As is common in complicated systems, once you fix one bottleneck, another one elsewhere in the system crops up. This time, it’s IOPS. It was eerie reading Mark Callaghan’s post about this last night – I’d come to the exact same conclusions (from an Operations point of view rather than code-level) just yesterday.

Bottom line: Despite having ample CPU and ample IO, InnoDB isn’t capable of using the IO provided. You can bet we’ll be working with Percona, Google and Sun (read: sitting back and admiring their brilliant work while writing the occasional check and providing production workload information) to look into fixing this.

In the meantime, we’re back to the old standbys: replication and data partitioning. Yes, we’re stacking lots of MySQL instances on each S7410 to maximize both our IOPS and our budget. Fun stuff – more on that later. :)

UPDATE: Just occurred to me that there are plenty of *new* readers to my blog who haven’t heard me praise Google and their patches before. Mark Callaghan’s team over at Google definitely deserves a shout-out – they’ve really been a catalyst for much of this work along with Percona.

  1. December 24, 2008 at 1:18 am

    Hey.

    Percona is a bit like the police department. If they solve all crime they'll put themselve out of business :)

    Hopefully there are plenty of InnoDB bugs left for them to fix…

    Our biggest headache at the moment is slow innodb recovery after a DB shard resync… It's single core bottlenecked.

    I want to see this fixed in 2009.

    And yeah…. Oracle has really dropped the ball here.

    There's no innovation in InnoDB – at least not in the areas I care about…

  2. December 24, 2008 at 3:06 am

    I learn a lot from the Percona blog and I am thrilled that they work on InnoDB.

  3. December 24, 2008 at 3:09 am

    Reading this makes me feel like I'm smart about Databases…sad I know.

  4. December 24, 2008 at 3:19 am

    Great to see you talking about the S7410, if enough people read your blog, it just might help spread the word to people still using other outrageously priced storage that doesn't have ZFS, analytics, etc.

  5. December 24, 2008 at 4:49 am

    Percona rocks! We are working with them to improve the advisors of MONyog MySQL Monitor and they have the best MySQL experts on the planet.

  6. Rick James
    December 24, 2008 at 4:52 am

    In a web environment, having more than a few dozen transactions running simultaneously is a sign of design or query optimization problems. (I'm in no hurry for increasing the 1024 limit.) Latency is important, but it can suffer when there are too many threads running.

    It would be nice if locks were fixed (Mark C has done good work here) so that 8 cores would perform better than 4.

    • December 24, 2008 at 5:03 am

      I'm not sure I understand your assertion. Are you saying, for example, that Facebook shouldn't have more than "a few dozen transactions running simultaneously" across their entire platform? How are they supposed to accomplish that with hundreds of millions of users? And if they do, they have poor design and/or query optimization?

      That doesn't seem to make sense, so I must be misunderstanding your point…

  7. Matthew Montgomery
    December 24, 2008 at 5:00 am

    Um, no… Oracle doesn't want to step on Oracle's toes by fixing InnoDB. Sun/MySQL does not maintain InnoDB and Sun can't license anything but the mainline InnoDB tree. Sun/MySQL wants those patches to be integrated. MySQL isn't trying to piss on it's own users, InnoDB is. I'm glad percona is doing this work to improve InnoDB, and proving its stability. Now if we can get InnoDB to stop denying it is of good quality and merge them finally it would clear Sun of this bad rap.

    • December 24, 2008 at 5:27 am

      MySQL or Sun could fork InnoDB. This seems inevitable, but it's taking them awhile to come to that conclusion.

      • December 24, 2008 at 6:45 am

        If they did, they won't be able to release a non-GPL version of the fork

  8. December 24, 2008 at 5:09 am

    A lot of work remains to be done to make read-write workloads scale beyond 4 cores with InnoDB. The Google and Percona patches solve most of the problems for read-only workloads (at least for 8 and 16 cores). There are many SMP scaling problems above the storage engine level. These don't get enough attention. They will be apparent once XtraDB becomes popular. The InnoDB problems can be fixed with a new plugin. How are the problems in MySQL source to be fixed? Do we have to wait for MySQL 6?

    • December 24, 2008 at 5:15 am

      Personally, I'm hoping Drizzle solves that for us. Like I said, I'm hoping Drizzle + XtraDB gets us on the right path.

      If not, you're absolutely right, MySQL is going to be a problem. :(

      Thanks again for all of your work on these issues. I'm not sure if you hadn't "broken the ice" and released all those great patches that we'd be where we are today. Your patches were the catalyst for all of this InnoDB forward motion this year, I think. If there's ever anything I can do to repay that effort, I'm all ears – just let me know.

      Thanks!

    • December 24, 2008 at 5:20 am

      Oh, and comments seem to be broken on your blog – the CAPTCHA wasn't working for me, so I coudln't reply to your post the other night.

      • December 24, 2008 at 5:31 am

        I have received a few comments, but not as many as I hoped for.

        Yasufumi Kinoshita gets the credit for starting us down the path of patching InnoDB. But I work for a high-profile company and blog more frequently. Regardless, Yasufumi and others are now busy at Percona improving and supporting their changes and a few of us at my employer are also at work so the future is brighter. I would like for MySQL to explain their plans for fixing the non-InnoDB SMP problems.

  9. Ken Jacobs
    December 24, 2008 at 5:36 am

    Don and Kevin, I'm really disappointed to see your comments about how well Oracle is supporting InnoDB. "Conspiracy theorists can all say I told you so!", but they would be wrong (as conspiracy theorists often are).

    One of the first things we did after the acquisition was to develop and release for 5.1.22 a significant scalability improvement in the way InnoDB generates auto-increment values. Perhaps you missed this enhancement: http://dev.mysql.com/doc/refman/5.1/en/innodb-aut… Before this change, generating auto-increment numbers was a serious point of pain for many users. As Brian Aker wrote (http://krow.livejournal.com/553585.html),"It is great to see the Innodb development team working on performance (and Oracle's commitment to continuing development on Innodb)".

    Many of the features released in the InoDB Plugin were under development for years before Oracle acquired Innobase OY. We have completed and released these long-awaited and long-promised features (including fast index creation, which is a huge and very important enhancement that affects almost all users). In the past three years, we have increased the number of developers working on InnoDB. We have significantly upgraded the QA/testing procedures used by the InnoDB team. We found (and fixed) many problems in InnoDB that users would likely not have found or reported (or would have been hard to reproduce), yet that could have been catastrophic in practice (e.g., recovery problems).

    For some workloads and data sets, the new compression capability in the InnoDB Plugin is a fantastic win. In many cases, it reduces file size, cuts down on i/o and improves throughput. Not all workloads can benefit, but you might want to give it a try to see if it works for you.

    We have also added a number of key features for relability and ease of use (for example, the file format management system that preserves upward and downward compatibility). To hear you tell it, nothing useful has come out of this team at all, but clearly many other bloggers have found tremendous value in the new features and performance of the InnoDB Plugin, not to mention its reliability.

    You're right that "Heikki is a great guy and has done amazing work with InnoDB, but the fact remains that it wasn't moving forward." It wasn't moving forward (fast enough) prior to the acquisition, and we have done all we can to accelerate development. Honestly, part of the problem here is that we have been bound/limited by the MySQL 5.1 schedule. When 5.1 was declared "RC", there was little or no opportunity to include significant new features. Remember that the plugin architecture was first introduced with 5.1. Thus, our first opportunity to release software more quickly than MySQL itself was with 5.1. We just released 1.0.2 of the InnoDB Plugin for MySQL's GA release (5.1.30) within days of is release. It is a shame that MySQL's architecture is such that a strict version number match is required for binary compatibility of the Plugin, but we will endeavor to release well tested and documented software for 5.1 (and 6.0) as quickly as we can.

    [See next entry.]

  10. Ken Jacobs
    December 24, 2008 at 5:36 am

    [continued]

    We're focused on retaining InnoDB's lead as the fastest, most functional and reliable storage engine available for MySQL. You will continue to see new features and performance and scalability improvements over time. Because we can now release InnoDB independently of MySQL, we can and will advance the state of the art more quickly than otherwise. Please give us some credit for taking advantage of this plugin architecture. Sun/MySQL has not yet accepted our offer to incorporate, distribute, or support the InnoDB Plugin, so it seems unfair to heap all the criticism on Oracle's/Innobase's door.

    Specifically with reference to "scalability and performance improvements" some cautionary remarks are appropriate. We all respect the folks at Percona. They are indeed very smart guys. But, how much testing and vetting of the changes have they done? Just because a patch is available doesn't mean it is well-tested, and works for a broad variety of workloads. Many of the published performance results are for in-memory databases or workloads that may not be representative. (Of course, if you are "providing production workload information" to Percona and Google, we'd all love to see you describe the workloads and share your results!) Additionally, you should consider whether or not the changes create unseen/unknown changes in on-disk formats. These could result in corruptions if the database was subsequently used with a different version of the software.

    For our part, we have been looking very closely at some of the performance/scalability patches available from the community for a while. We've found some bugs in some of those patches, and we've been working to see they are repaired before these changes are incorporated in software we release. You of course know that making changes in low-level code related concurrency is complex and risky. No one wants to see corruptions, hangs, deadlocks or crashes. It is precisely from small, esoteric bugs in this level of code that such problems arise. We're not saying the Percona code is buggy, but we are saying that one has to be very careful in relying on community contributions, especially in the area of concurrency control. No one on the planet understands InnoDB as well as Heikki and his team, so you can count on them to fully understand and test any performance and scalability changes we will incorporate.

    We're glad to see the work Percona and Google are doing. It's teriffic that their changes will be released to the community (and, under the GPL must be released if they distribute the software). As we've noted elsewhere, we are not opposed in principle to accepting such contributions. There are three simple criteria: technical reliability and correctness, appropriateness (e.g., we may have a better solution planned or under development) and suitable intellectual property provisions (e.g. a BSD license).

    So, while you may want to join the "conspiracy theorists", you may want to reconsider, after you look at the facts. Our track record has been one of a good partner and responsible developer of a critical piece of technology MySQL and its users depend on daily. We're contiuing our investment in InnoDB, and all I can say at this point is "stay tuned, more to come".

    • December 24, 2008 at 5:54 am

      Ken,

      With all due respect, I've asked you and Heikki to look into these issues for *4 years*, and MySQL to do the same, with the same end-result: no movement.

      You absolutely deserve props for the AUTO_INCREMENT patch. IMHO, it took too long to arrive, but better late than never – it works very well, no doubt. And I'm positive the InnoDB plugin was a big win for lots of your customers, but it wasn't for us. I prefer on-disk compression using ZFS to table compression, and disk spaces / throughput aren't our bottlenecks: we have more storage and IOPS capacity than InnoDB is able to make use of.

      Clearly, you guys have been working hard on what you think is important for your customers. Which means, just as clearly, we must not be your target customer. There's nothing wrong with that – I respect businesses that choose their customers carefully and then do everything to support them. That's how I run my own business.

      I just wish I could convince you that I'm a good target customer, but after years of pain and pleading, it's just become apparent that just isn't going to happen. So we've had to find solutions elsewhere. Percona stepped up to the plate.

      You're absolutely right about the nature of community patches, and it's something that keeps me up at night worrying, but at the same time, InnoDB right now is so hard to scale, I'm stuck between a rock and a hard place: A site that's down or community patches that might affect our workload in unknown ways.

      Given those two choices, I'd rather be part of the solution: by running our workload on the community patches, and providing feedback, we can help work out the kinks in those patches. It might make for some short-term pain, but we believe that it'll result in long-term gains. We're ok with that. *knocks on wood*

      Of course, we roll these things out gradually. We set up one slave out of production, test on it, then gradually move it into production, test on it, then roll it out to more slaves and clusters. It's not like we're just replacing every MySQL instance overnight with these patches – we're trying to carefully gauge their impact as best we can.

      Honestly, it's sad we have to go down this route. I'd much rather pay Oracle or Sun or whomever will take my money to solve these problems for us instead of going the community route. But what choice do I have?

  11. December 24, 2008 at 6:32 am

    Hi Don!

    For Drizzle we have published a tree for XtraDB so that we can track its changes (also, we have a much larger battery of tests for testing Innodb then MySQL does… we are curious to see how the engine works overall). You can find it here:
    https://code.launchpad.net/~mordred/drizzle/xtrad

    We have moved our internal Innodb to using the Innodb plugin. I can confirm Peter's findings on regression (which is that there is little to none). There is a bigger issue that the plugin doesn't work with the new MRR, but frankly MRR has turned out to be completely unstable.

    One thing also worth watching is this:
    https://code.launchpad.net/~drizzle-pbxt/drizzle/

    We stay strongly interested in what Paul is doing for PBXT. There has been some mention of it out performing other engines for SSD. We haven't confirmed this yet, but I, and a number of others, continue to find his work interesting (and he is progressing quickly).

    Happy Holidays!

    Cheers,
    -Brian

  12. Ken Jacobs
    December 24, 2008 at 6:34 am

    Don, thanks for your response … we like getting props! ;-) Maybe if there were a few more props being handed out, I'd have less trouble reading statements like some of those above about history and Oracle's behavior and intentions.

    I understand that some people (and I don't mean you) in the blogosphere get emotional or fanatical or religious, but I just believe a little more balanced discussion and wider dissemination of facts (along with opinion) would be more helpful to everyone concerned.

    We'd be interested why you find ZFS more suitable for your workload and whether or not you've compared it to the built-in compression in the InnoDB Plugin. If you have some performance data, we'd love to see it!

    We do think you're an appropriate target customer. But, I don't recall any conversations with you about your concerns, or any specific feedback and performance testing results you've provided. (AFAIR my only interactions with SmugMug have been as a user!). We'd like to better understand your priorities. Feel free to share more information (publicly or privately) about your workload, your requirements, your performance testing, your configuration, etc. I can't promise that we will focus exclusively on your needs, nor can I promise that we'll agree about the proper technical approach (e.g., on the number of concurrent transactions issue), but your input is welcome. You can be part of the solution (by running your workload and providing feedback) on official InnoDB releases. Please let us know how we can engage with you more effectively.

    We're glad you are successful with InnoDB (with or without third party patches). Evidently knocking on wood and doing your own testing is effective for you! ;-) Of course every user should always test the software they are using for their own environment. While you can make the tradeoffs required to deploy the third party patches, not every user can afford to do so. And, since we and MySQL must license our software commercially, there are constraints on what we can do.

    Best wishes and happy holidays!

  13. December 24, 2008 at 7:02 am

    Don, what would be interesting is if you and others could post a list of recommended patches for those just starting out. I know that not all patches apply to all workloads but it is useful to at least see a decent baseline.

  14. December 24, 2008 at 8:27 am

    Ken,

    Quick comment (not being involved into discussing of development process), yes – many benchmarks are related to "in-memory" databases. But many customers are trying to keep at least active dataset in memory. And currently as 64GB RAM is available at appropriate price – it is easier then before. This means "in-memory" case is getting more important. And 16 / 24 cores servers are coming. Is InnoDB ready for this challenge ?

    And about IO-bound cases – our / Google IO patches are addressing problems with IO scalability. I published benchmark related to IO-bound cases.

  15. Eric
    December 24, 2008 at 12:53 pm

    Did you buy 1 pair S7410(s) for HA ?

    • December 24, 2008 at 12:58 pm

      We bought 4 clustered configs, yes. S7410c I think they're called. Two head units, each with one "Readzilla" SSD read accelerator, and one JBOD with 22 1TB disks and 2 "Logzilla" SSD write accelerators.

  16. January 13, 2009 at 10:00 am

    Thanks for the information about Percona. I actually hadn't heard of them.

    Thanks for information.

    Misty

    Check out http://www.Viscape.com!

    Write a destination review, and you could win a trip to Belize or Key West!

  17. January 26, 2009 at 7:02 pm

    Эрудит – человек, который всегда найдет синоним, если не знает как

  18. January 26, 2009 at 7:04 pm

    Быть вторым означает быть основным блюдом.

  19. February 10, 2009 at 3:16 am

    Great information. Thanks again for showing us all the great tools and vendors you are using to keep Smug Mug kick ass.

  20. oliver
    July 17, 2009 at 10:05 am

    Thanks for a very informative post! Are you able to elaborate on the S7410 pricing at all? Half sticker price?

  21. October 12, 2009 at 11:56 am

    super blog//////
    thanks for the post!

  22. November 14, 2009 at 3:35 pm

    Hey Don, wish I could have joined you but it's a bit far from Switzerland ;) Luckily I managed to catch it at IMAX Irvine, OC.

  1. December 23, 2008 at 8:16 pm
  2. December 24, 2008 at 8:09 am
  3. December 24, 2008 at 9:06 am
  4. January 2, 2009 at 10:34 am
  5. January 7, 2009 at 6:03 pm
  6. January 9, 2009 at 8:32 am
  7. February 19, 2009 at 6:16 pm
Comments are closed.
Follow

Get every new post delivered to your Inbox.

Join 34 other followers

%d bloggers like this: