If you have been dealing with large MySQL data for a while, you may have faced an interesting case like this at some point – one which involves a table with data merged with INSERT IN DUPLICATE KEY queries.
Even if you haven’t faced such a situation so far, you may be able to tell that such a table will show an extremely slow performance, and the culprit most definitely is the multitude of partitions that have been made every day for this table. In this blog, we will examine if changing the number of partitions can improve MySQL database performance.
Improve MySQL Database Performance by Paying Attention to Partitions!
Surprisingly, the statements are also affected differently from each other, and this is something that also impacts performance.
To understand this problem a bit more clearly, let us consider an example where we create a test table –
CREATE TABLE `Partition` (
`number` int(10) unsigned NOT NULL,
`name` int(10) unsigned NOT NULL,
PRIMARY KEY (`number`),
PARTITION BY RANGE(number) (
PARTITION p100000 VALUES LESS THAN(100001),
PARTITION p200000 VALUES LESS THAN(200001),
PARTITION p300000 VALUES LESS THAN(300001),
PARTITION p400000 VALUES LESS THAN(400001),
PARTITION p500000 VALUES LESS THAN(500001),
PARTITION p600000 VALUES LESS THAN(600001),
PARTITION p700000 VALUES LESS THAN(700001),
PARTITION p800000 VALUES LESS THAN(800001),
PARTITION p900000 VALUES LESS THAN(900001),
PARTITION p1000000 VALUES LESS THAN(1000001)
In this example, we are varying the quantity of partitions between 1 and 1000. We are also loading the table with one million sequential values through bulk insert queries, where the number and name columns are set the same and there are a thousand rows in each table.
Time Taken to Load in Different Scenarios:
- This data ends up taking around 10 seconds for loading a single partition, 11 seconds for ten partitions, 17 seconds for a hundred partitions and 24 seconds for a thousand partitions.
- Also, every time the number of partitions increases by a thousand, the loading speed decreases 2.5 times.
- Such regression is slightly unexpected, given the lack of data insertion into the partitions in every insert statement. Moreover, it worsens when the user tests an update statement with a condition – “set name=name+1”; the loss in performance is at least five times with a change in the pattern, and it is way more drastic as the user moves from a hundred to a thousand partitions than it was in the case where we only used the insert command.
- Finally, the difference magnifies when the user eliminates the index on column C. The UPDATE aspect of the INSERT ON DUPLICATE KEY UPDATE statement takes 23 seconds for a single partition and more than 240 seconds for a thousand partitions. This is more than ten times the original time!
This issue with partitions impacts both MyISAM and Innodb, where the update query in MyISAM without indices can take ten seconds for 1 partition and upwards of fifty seconds for 1000 partitions.
Why is this Happening?
The dramatically increased load times may be caused by one of two suspects – either the overhead of statements creating the partitions for execution or the overhead created by executing each row.
On testing the batch performance with varying numbers of rows in the batches, the batch performance was not found to be significantly different among batches with 100 rows and those with 10000, which is why the answer here is the per row overhead.
Additionally, this test reveals that update statements for CPU bound workload can prove to be about five times slower than insert statements.
Users must certainly keep an eye on the number of partitions being used and think about future use while creating idle partitions. This simple step can improve MySQL database performance to a large extent, but if this is not the case with you, it might be wise to consider tuning tools for MySQL, such as Tosska SQL Tuning Expert (TSEM™) for MySQL.
This is because load times and other such results are highly dependent on workloads and every case is different. Nevertheless, these tools will prove indispensable for improving your database’s performance.