Mysql improve insert performance innodb

I think I've figured out what's going on, but not sure exactly why. In the large replace statement, there are two subselects that cause the query to go from 3 seconds to several minutes:

(SELECT Max(DealerId) FROM LoanApplicationDealer LAD WHERE LAD.LoanApplicationId=NewestLoanApplicationID(377163))

and

(SELECT FileProcessDate FROM LoanApplicationVersion LAV WHERE LAV.Id=LoanApplicationIDMaxVersion(377163))

The problem is the same in both statements. In the first query, it's pulling the ENTIRE LoanApplicationDealer table, and comparing each LoanApplicationId to the function NewestLoanApplicationId(377163), which itself has to do a select or two on some other tables. During an insert transaction, this function would only ever return one value for any particular parameter, so there's no point in running it over and over again. That's the looping I was seeing in the logs. If I move that out into a variable, it should just run the function once and the queries will just pull a single row via the indexes on that table. 

The only question remaining is why the new server has to check every value and the old server does not. Perhaps the old server was caching the value from NewestLoanApplicationID(377163)? Not sure why it wouldn't on the new server; I can probably pull that function out into a variable and the code would be portable between the old & new servers. 

Update: Confirmed. Also pulling that function out into a variable, then using it in the sub query lowered run times per job from about 15 seconds to 7 on the old server. New server even faster.

Was this post helpful? thumb_up thumb_down

07 Jan 2011

I again work with the system which needs high insertion rate for data which generally fits in memory. Last time I worked with similar system it used MyISAM and the system was built using multiple tables. Using multiple key caches was the good solution at that time and we could get over 200K of inserts/sec.

This time I worked with Innodb tables… it was a different system with different table structure, not to mention different hardware so It can’t be compared directly, still it is nice to see you can get the numbers as high with Innodb too.

I will spare you all experiments we went through and just share final numbers. On 8 core Opteron Box we were able to achieve 275K inserts/sec at which time we started to see load to get IO bound because of log writes and flushing dirty buffers. I’m confident you can get to 400K+ inserts/sec on faster hardware and disks (say better RAID or Flash) which is a very cool number. Of course, mind you this is in memory insertion in the simple table and table with long rows and bunch of indexes will see lower numbers.

So what’s the deal ? First MySQL 5.5 (frankly I did not try Percona Server 5.1 in this case) With MySQL 5.1 and Innodb Plugin we could see 40%+ CPU wasted on mutex spinlocks (per oprofile), which went down to about 15% in MySQL 5.5.8 with 8 concurrent threads. This both shows there is a substantial gains as well as room for more performance optimizations. Dmitri has good suggestions on tuning MySQL 5.5 and this is what I used for start. Using multiple buffer pools with innodb_buffer_pool_instances=8 was very important.

Second thing – Partitioning. Unfortunately MySQL 5.5 leaves the huge bottleneck for write workloads in place – there is per index rw lock, so only one thread can insert index entry at the time, which can be significant bottleneck. We got 2x+ better performance by hash partitioning table by one of the columns and I would expect gains can be higher with more cores. PARTITION BY HASH(col) PARTITIONS 8 is what we used. This looks like a good workaround but remember partitioning can impact performance of your select queries dramatically.

The inserts in this case of course are bulk inserts… using single value inserts you would get much lower numbers. In fact we used load data infile which is one of the ways to get a great performance (the competing way is to have prepared bulk insert statements).

We need to try new Percona Server 5.5 on our Cisco box to see if we can get to 500K inserts/sec – this can be a nice round number 🙂

Author

Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.



Sometimes MySQL needs to work hard. I've been working on an import script that fires a lot of INSERTs. Normally our database server handles 1,000 inserts / sec. That wasn't enough. So I went looking for methods to improve the speed of MySQL inserts and was finally able to increase this number to 28,000 inserts per second. Checkout my late night benchmarking adventures.

I'm going to show you the result of 3 approaches that I tried to boost the speed of 'bulk' queries:

  • Delayed Insert
  • Transaction
  • Load Data

This article focusses on the InnoDB storage engine.

Delayed Insert

MySQL has an INSERT DELAYED feature. Despite the name this is actually meant to speedup your queries ; ) And from what I understand it does a very good job.

Unfortunately it only works with MyISAM, MEMORY, ARCHIVE, and BLACKHOLE tables.

That rules out my favorite storage engine of the moment: InnoDB.

So where to turn?

Transaction

A Transaction basically combines multiple queries in 1 'package'. If 1 query in this package fails: you can 'cancel' all the queries within that package also.

So that provides additional integrity to your relational data because if record A could not be deleted but depends on record B which could be deleted, you have a broken dependency in your database and that corruption could have easily been avoided using a Transaction.

Let me show you how easy a transaction really is in basic PHP/SQL terms:

<?php
mysql_query("START TRANSACTION");
mysql_query("INSERT INTO `log` (`level`, `msg`) VALUES ('err', 'foobar!')");
?>

<?php
mysql_query("INSERT INTO `log` (`level`, `msg`) VALUES ('err', 'foobar!')");
?>

<?php
mysql_query("INSERT INTO `log` (`level`, `msg`) VALUES ('err', 'foobar!')");
?>

<?php
mysql_query("COMMIT"); // Or "ROLLBACK" if you changed your mind
?>

OK moving on :)

Transaction Performance - the Theory

I showed you the integrity gain. That's reason enough to 'go Transactional' right now. But as an added bonus, Transactions could also be used for performance gain. How?

  • Normally your database table gets re-indexed after every insert. That's some heavy lifting for you database.

But when your queries are wrapped inside a Transaction, the table does not get re-indexed until after this entire bulk is processed. Saving a lot of work.

Bulk processing will be the key to performance gain.

Bench Results

So far the theory. Now let's benchmark this. What does it gain us in queries per second (qps) terms:

As you can see

  • I was not able to put this theory into practice and get good results.
  • There is some overhead in the Transaction which actually causes a performance to drop for bulks with less than 50 queries.

I tried some other forms of transaction (showed in a graph below) but none of them really hit the jackpot.

OK so Transactions are good to protect your data, and in theory can have performance gain, but I was unable to produce that.

Clearly this wasn't the performance boost I was hoping for.

Moving on.

Load Data - the Mother Load

MySQL has a very powerful way of processing bulks of data called LOAD DATA INFILE. The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed.

Bench Results

In the following graph I tried to inserts different sized bulks of inserts using different methods. I recorded & calculated in how much time each query could be executed. I use the total time necessary for the entire operation, and divide that by the number of queries. So what you see is really what you get.

OK enough with these so-called facts ; ) Back the the excitement :D

At 10,000 records I was able to get a performance gain of 2,124.09%

As you can see

  • Where the Transaction method had maximum throughput of 1,588 inserts per second, Load Data allowed MySQL to process process a staggering 28,108 inserts per second.
  • There is no siginifcant overhead in Load Data. e.g. you can use this with 2 queries per bulk and still have a performance increase of 153%.
  • There is a saturation point around bulks of 10,000 inserts. After this point the queries per second rate (qps) didn't show an increase anymore.
  • My advice would be to start a new bulk every 1,000 inserts. It's what I concider to be the sweetspot because it keeps buffers small and you will still benefit from a performance gain of 2027.13%.

The next step will make your buffer 1000% bigger and it will only give you an additional performance gain of 4%.

So if you have a heavy-duty MySQL job that currently takes 1 hour to run, this approach could make it run within 3 minutes! Enjoy the remaining 57 minutes of your hour! :D

Load Data Quirks

Of course there is a price to pay for this performance win. Before the data is loaded, The data-file must be:

  • Saved on disk (or in RAM, see my other article Create turbocharged storage using tmpfs)
  • In comma-separated values (CSV) format.

This is probably not something you want to be bothered with. So why not create a PHP function that handles these quirks for us?

Wrapping This Up in a PHP Function

Let's save this logic inside a function so we can easily reuse it to our benefit.

We'll name this function mysqlBulk and use it like this:

  • Collect our queries or data in an array (the bulk).
  • Feed that array along with the table name to the mysqlBulk function
  • Have it return the qps for easy benchmarking. Or false on failure.

Source (still working on this, will be updated regularly):

<?php
/**
 * Executes multiple queries in a 'bulk' to achieve better
 * performance and integrity.
 *
 * @param array  $data   An array of queries. Except for loaddata methods. Those require a 2 dimensional array.
 * @param string $table
 * @param string $method
 * @param array  $options
 *
 * @return float
 */
function mysqlBulk(&$data, $table, $method = 'transaction', $options = array()) {
  // Default options
  if (!isset($options['query_handler'])) {
      $options['query_handler'] = 'mysql_query';
  }
  if (!isset($options['trigger_errors'])) {
      $options['trigger_errors'] = true;
  }
  if (!isset($options['trigger_notices'])) {
      $options['trigger_notices'] = true;
  }
  if (!isset($options['eat_away'])) {
      $options['eat_away'] = false;
  }
  if (!isset($options['in_file'])) {
      // AppArmor may prevent MySQL to read this file.
      // Remember to check /etc/apparmor.d/usr.sbin.mysqld
      $options['in_file'] = '/dev/shm/infile.txt';
  }
  if (!isset($options['link_identifier'])) {
      $options['link_identifier'] = null;
  }

  // Make options local
  extract($options);

  // Validation
  if (!is_array($data)) {
      if ($trigger_errors) {
          trigger_error('First argument "queries" must be an array',
              E_USER_ERROR);
      }
      return false;
  }
  if (empty($table)) {
      if ($trigger_errors) {
          trigger_error('No insert table specified',
              E_USER_ERROR);
      }
      return false;
  }
  if (count($data) > 10000) {
      if ($trigger_notices) {
          trigger_error('It\'s recommended to use <= 10000 queries/bulk',
              E_USER_NOTICE);
      }
  }
  if (empty($data)) {
      return 0;
  }

  if (!function_exists('__exe')) {
      function __exe ($sql, $query_handler, $trigger_errors, $link_identifier = null) {
          if ($link_identifier === null) {
              $x = call_user_func($query_handler, $sql);
          } else {
              $x = call_user_func($query_handler, $sql, $link_identifier);
          }
          if (!$x) {
              if ($trigger_errors) {
                  trigger_error(sprintf(
                      'Query failed. %s [sql: %s]',
                      mysql_error($link_identifier),
                      $sql
                  ), E_USER_ERROR);
                  return false;
              }
          }

          return true;
      }
  }

  if (!function_exists('__sql2array')) {
      function __sql2array($sql, $trigger_errors) {
          if (substr(strtoupper(trim($sql)), 0, 6) !== 'INSERT') {
              if ($trigger_errors) {
                  trigger_error('Magic sql2array conversion '.
                      'only works for inserts',
                      E_USER_ERROR);
              }
              return false;
          }

          $parts   = preg_split("/[,\(\)] ?(?=([^'|^\\\']*['|\\\']" .
                                "[^'|^\\\']*['|\\\'])*[^'|^\\\']" .
                                "*[^'|^\\\']$)/", $sql);
          $process = 'keys';
          $dat     = array();

          foreach ($parts as $k=>$part) {
              $tpart = strtoupper(trim($part));
              if (substr($tpart, 0, 6) === 'INSERT') {
                  continue;
              } else if (substr($tpart, 0, 6) === 'VALUES') {
                  $process = 'values';
                  continue;
              } else if (substr($tpart, 0, 1) === ';') {
                  continue;
              }

              if (!isset($data[$process])) $data[$process] = array();
              $data[$process][] = $part;
          }

          return array_combine($data['keys'], $data['values']);
      }
  }

  // Start timer
  $start = microtime(true);
  $count = count($data);

  // Choose bulk method
  switch ($method) {
      case 'loaddata':
      case 'loaddata_unsafe':
      case 'loadsql_unsafe':
          // Inserts data only
          // Use array instead of queries

          $buf    = '';
          foreach($data as $i=>$row) {
              if ($method === 'loadsql_unsafe') {
                  $row = __sql2array($row, $trigger_errors);
              }
              $buf .= implode(':::,', $row)."^^^\n";
          }

          $fields = implode(', ', array_keys($row));

          if (!@file_put_contents($in_file, $buf)) {
              $trigger_errors && trigger_error('Cant write to buffer file: "'.$in_file.'"', E_USER_ERROR);
              return false;
          }

          if ($method === 'loaddata_unsafe') {
              if (!__exe("SET UNIQUE_CHECKS=0", $query_handler, $trigger_errors, $link_identifier)) return false;
              if (!__exe("set foreign_key_checks=0", $query_handler, $trigger_errors, $link_identifier)) return false;
              // Only works for SUPER users:
              #if (!__exe("set sql_log_bin=0", $query_handler, $trigger_error)) return false;
              if (!__exe("set unique_checks=0", $query_handler, $trigger_errors, $link_identifier)) return false;
          }

          if (!__exe("
             LOAD DATA INFILE '${in_file}'
             INTO TABLE ${table}
             FIELDS TERMINATED BY ':::,'
             LINES TERMINATED BY '^^^\\n'
             (${fields})
         ", $query_handler, $trigger_errors, $link_identifier)) return false;

          break;
      case 'transaction':
      case 'transaction_lock':
      case 'transaction_nokeys':
          // Max 26% gain, but good for data integrity
          if ($method == 'transaction_lock') {
              if (!__exe('SET autocommit = 0', $query_handler, $trigger_errors, $link_identifier)) return false;
              if (!__exe('LOCK TABLES '.$table.' READ', $query_handler, $trigger_errors, $link_identifier)) return false;
          } else if ($method == 'transaction_keys') {
              if (!__exe('ALTER TABLE '.$table.' DISABLE KEYS', $query_handler, $trigger_errors, $link_identifier)) return false;
          }

          if (!__exe('START TRANSACTION', $query_handler, $trigger_errors, $link_identifier)) return false;

          foreach ($data as $query) {
              if (!__exe($query, $query_handler, $trigger_errors, $link_identifier)) {
                  __exe('ROLLBACK', $query_handler, $trigger_errors, $link_identifier);
                  if ($method == 'transaction_lock') {
                      __exe('UNLOCK TABLES '.$table.'', $query_handler, $trigger_errors, $link_identifier);
                  }
                  return false;
              }
          }

          __exe('COMMIT', $query_handler, $trigger_errors, $link_identifier);

          if ($method == 'transaction_lock') {
              if (!__exe('UNLOCK TABLES', $query_handler, $trigger_errors, $link_identifier)) return false;
          } else if ($method == 'transaction_keys') {
              if (!__exe('ALTER TABLE '.$table.' ENABLE KEYS', $query_handler, $trigger_errors, $link_identifier)) return false;
          }
          break;
      case 'none':
          foreach ($data as $query) {
              if (!__exe($query, $query_handler, $trigger_errors, $link_identifier)) return false;
          }

          break;
      case 'delayed':
          // MyISAM, MEMORY, ARCHIVE, and BLACKHOLE tables only!
          if ($trigger_errors) {
              trigger_error('Not yet implemented: "'.$method.'"',
                  E_USER_ERROR);
          }
          break;
      case 'concatenation':
      case 'concat_trans':
          // Unknown bulk method
          if ($trigger_errors) {
              trigger_error('Deprecated bulk method: "'.$method.'"',
                  E_USER_ERROR);
          }
          return false;
          break;
      default:
          // Unknown bulk method
          if ($trigger_errors) {
              trigger_error('Unknown bulk method: "'.$method.'"',
                  E_USER_ERROR);
          }
          return false;
          break;
  }

  // Stop timer
  $duration = microtime(true) - $start;
  $qps      = round ($count / $duration, 2);

  if ($eat_away) {
      $data = array();
  }

  @unlink($options['in_file']);

  // Return queries per second
  return $qps;
}
?>

Using the Function

The mysqlBulk function supports a couple of methods.

Array Input With Method: Loaddata (Preferred)

What would really give it wings, is if you can supply the data as an array. That way I won't have to translate your raw queries to arrays, before I can convert them back to CSV format. Obviously skipping all that conversion saves a lot of time.

<?php
$data   = array();
$data[] = array('level' => 'err', 'msg' => 'foobar!');
$data[] = array('level' => 'err', 'msg' => 'foobar!');
$data[] = array('level' => 'err', 'msg' => 'foobar!');

if (false === ($qps = mysqlBulk($data, 'log', 'loaddata', array(
    'query_handler' => 'mysql_query'
)))) {
    trigger_error('mysqlBulk failed!', E_USER_ERROR);
} else {
    echo 'All went well @ '.$qps. ' queries per second'."n";
}
?>

Most of the time it's even easier cause you don't have to write queries.

SQL input with method: loadsql_unsafe

If you can really only deliver raw insert queries, use the loadsql_unsafe method. It's unsafe because I convert your queries to arrays on the fly. That also makes it 10 times slower (still twice as fast as other methods).

This is what the basic flow could look like:

<?php
$queries   = array();
$queries[] = "INSERT INTO `log` (`level`, `msg`) VALUES ('err', 'foobar!')";
?>

<?php
$queries[] = "INSERT INTO `log` (`level`, `msg`) VALUES ('err', 'foobar!')";
?>

<?php
$queries[] = "INSERT INTO `log` (`level`, `msg`) VALUES ('err', 'foobar!')";
?>

<?php
if (false === ($qps = mysqlBulk($queries, 'log', 'loadsql_unsafe', array(
    'query_handler' => 'mysql_query'
)))) {
    trigger_error('mysqlBulk failed!', E_USER_ERROR);
} else {
    echo 'All went well @ '.$qps. ' queries per second'."n";
}
?>

Safe SQL Input With Method: Transaction

Want to do a Transaction?

<?php
mysqlBulk($queries, 'transaction');
?>

Options

Change the query_handler from mysql_query to your actual query function. If you have a DB Class with an execute() method, you will have to encapsulate them inside an array like this:

<?php
$db = new DBClass();
mysqlBulk($queries, 'log', 'none', array(
    'query_handler' => array($db, 'execute')
);
// Now your $db->execute() function will actually
// be used to make the real MySQL calls
?>

Don't want mysqlBulk to produce any errors? Use the trigger_errors option.

<?php
mysqlBulk($queries, 'log', 'transaction', array(
    'trigger_errors' => false
);
?>

Want mysqlBulk to produce notices? Use the trigger_notices option.

<?php
mysqlBulk($queries, 'log', 'transaction', array(
    'trigger_notices' => true.
);
?>

Have ideas on this? Leave me a comment.

Benchmark Details - What Did I Use?

Of course solid benching is very hard to do and I already failed once. This is what I used.

Table Structure

I created a small table with some indexes & varchars. Here's the structure dump:

--
-- Table structure for table `benchmark_data`
--

CREATE TABLE `benchmark_data` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `user_id` smallint(5) unsigned NOT NULL,
  `a` varchar(20) NOT NULL,
  `b` varchar(30) NOT NULL,
  `c` varchar(40) NOT NULL,
  `d` varchar(255) NOT NULL,
  `e` varchar(254) NOT NULL,
  `created` timestamp NOT NULL default CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `a` (`a`,`b`),
  KEY `user_id` (`user_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1;

Table Data

I filled the table with ~2,846,799 records containing random numbers & strings of variable length. No 1000 records are the same.

Machine

I had the following configuration to benchmark with:

Product Name: PowerEdge 1950
Disks: 4x146GB @ 15k rpm in RAID 1+0
Memory Netto Size: 4 GB
CPU Model: Intel(R) Xeon(R) CPU E5335 @ 2.00GHz
Operating System: Ubuntu 8.04 hardy (x86_64)
MySQL: 5.0.51a-3ubuntu5.4
PHP: 5.2.4-2ubuntu5.5

  • Provided by True.nl

Thanks to Erwin Bleeker for pointing out that my initial benchmark was shit.

Finally

This is my second benchmark so if you have some pointers that could improve my next: I'm listening.

How can I improve my InnoDB performance?

innodb_thread_concurrency: With improvements to the InnoDB engine, it is recommended to allow the engine to control the concurrency by keeping it to default value (which is zero). If you see concurrency issues, you can tune this variable. A recommended value is 2 times the number of CPUs plus the number of disks.

How do you make MySQL insert faster?

To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.

How can I make my inserts faster?

You can use the following methods to speed up inserts: If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.

Why MyISAM is faster than InnoDB?

MyISAM vs InnoDBStorage: Performance The performance of InnoDB for large volumes of data is better as compared to MyISAM. MyISAM doesn't support transactional properties and is faster to read. As compared to InnoDB, the performance for a high volume of data is less.