How to Tune substr(emp_name,5,4) SQL Statement?

sql performance monitoring

There may be some business requirements that need to compare certain part of a column as a data retrieval criteria. Here is an example SQL that retrieves data from EMPLOYEE table employee’s name with a string pattern “Acco” start from 5 character of the emp_name.

select    *
  from    employee
where    substr(emp_name,5,4)=‘Acco’

Here the following are the query plans of this SQL, it takes 17 seconds to finish. The query shows a “Full Table Scan Employee”  

You can see that this SQL cannot utilize index scan even the emp_name is indexed field. Let me add a “Force Index(emp_name_inx)“ hints to the SQL and hope it can help MySQL SQL optimizer to use index scan, but it fails to enable the index scan anyway, so I add one more dummy condition emp_name >= ‘ ‘ , it is an always true condition that emp_name should be greater or equal to a smallest empty character.

select    *
from       employee force index(emp_name_inx)
where    substr(emp_name,5,4) = ‘Acco’
                 and emp_name >= ‘ ‘

Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows that an Index Range Scan is used now.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 6 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune “order by rand() limit 1” SQL Statement ?

sql performance monitoring

There may be some business requirements that needs to retrieve only some rows from a table (or join tables) randomly. This kind of SQL is normally hard to tune.

For example the following SQL retrieve one row from two tables join of Employee and Department, where Employee’s department code is ‘AAA’, and both Emp_dept and Dpt_id columns are indexed.

select  *
  from employee ,department
where emp_dept=dpt_id
and emp_dept = ‘AAA’
order by rand()
limit 1

Here the following is the query plan of this SQL, it takes 3.126 seconds to finish. The query shows a nested loop from Department table to Employee table to extract all records with ‘AAA’ department code. An Order operation is executed followed from the join result.

You can see the most expensive step is to extract all Employee data with department code “AAA”, if there is an unique key such Employee ID (EMP_ID) which can uniquely identify a row from the query, you can use the With common table expressions in MySQL version 8 to randomly select rows from the KEY column only with the same conditions given by the original query, for example the following blue colored SQL text in “With” clause, it randomly select 1 row of EMP_ID from the join query. It not only significantly reduces the size of the data retrieved from Employee table, but it also shrank the size of Order By operation. And then the main query will use the selected EMP_ID to extract specific row from original query, so the whole query will run faster with this new rewrite syntax.

with DT1
     as (select  EMP_ID
        from   employee,
         department
        where   emp_dept = dpt_id
         and emp_dept = ‘AAA’
        order by rand() limit 1
)
      select    *
      from       employee,
     department
      where    emp_dept = dpt_id
     and emp_dept = ‘AAA’
     and EMP_ID in (select EMP_ID
             from   DT1)
      order by rand() limit 1

/* Remark:  “order by rand() limit 1” is used to make sure that only 1 row will be selected if the EMP_ID cannot uniquely identify only one row */

Here is the query plan of the rewritten SQL with less cost and run much faster.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 100 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune “Order by Limit N” SQL Statement?

sql performance monitoring

There may be some business requirements that need to retrieve the first N number of rows from a join tables. Some people may encounter unexpected performance problem.

Here is an example SQL that retrieves first 1000 row from a join tables of Employee and Department in the order of dpt_id.  Where emp_dept and dpt_id columns are both indexed.

select *
     from employee ,department
where emp_dept=dpt_id
order by dpt_id
limit 1000

Here the following are the query plans of this SQL, it takes 1 mins and 8 seconds to finish. The query shows a nested loop from “Full Table Scan Employee” to “Unique Key Lookup Department” table to extract all records. An “Order by” operation of dpt_id of Department table is executed followed from this join result.

You can see the most expensive step is to extract all Employee data with Department data and then an Order by dpt_id  of Department table. Let’s see if we rewrite the original SQL text into the syntax that “order by dpt_id” is changed to “order by emp_dept”. It is a semantically eqvialent rewrite of original SQL statement since “emp_dept = dpt_id”.

select       *
from          employee,
                     department
where       emp_dept = dpt_id
order by    emp_dept
limit            1000

Here is the query plan of the rewritten SQL with less cost and run much faster. The new query plan shows that no “Use temporary; Using filesort” in Tabular Explain, it means the Nested Loop operation will be stop at the Limit 1000 records is done.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 70 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL with LIKE ‘%Mary%’ comparison?

sql performance monitoring

The LIKE is a logical operator that determines if a character string matches a specified pattern. A pattern may include regular characters and wildcard characters. The LIKE operator is used in the WHERE clause of the SELECT, UPDATE, and DELETE statements to filter rows based on pattern matching.

Here is an example SQL that retrieves data from EMPLOYEE table employee’s name with a string pattern like “Mary%”. If the emp_name is indexed, the following SQL will utilize Index Range Scan of the emp_name and the speed of the SQL will be fine.

select *
from employee
where emp_name like ‘Mary%’;

If user is looking for emp_name with pattern like ‘%Mary%’, MySQL SQL Optimizer cannot user emp_name index to speed up the process, full table scan is normally be used and the performance will be bad too.

select *
from employee
where emp_name like ‘Mary%’;

Here the following are the query plan of this SQL, it takes 18.8 seconds to finish. The query shows a “Full Table Scan” of employee table.

You can see that this SQL cannot utilize index scan even the emp_name is indexed. Let me add a “Force Index(emp_name_inx)“ hints to the SQL and hope it can help MySQL SQL optimizer to use index scan, but it fails to enable the index scan anyway, so I add one more dummy condition emp_name >= ” , it is an always true condition that emp_name should always greater or equal to a smallest empty character.

select *
from   employee force index(emp_name_inx)
where  emp_name like ‘%Mary%’
       and emp_name >= ‘ ‘;

Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows that an Index Range Scan is used now.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 5 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

Analyzing the Inner Works of MySQL SQL Performance Tuning and Oracle Database

When it comes to MySQL, slow performance in large tables is one of the main sources of complaints. It is true that some users face problems as their database fails to sufficiently handle a more than a certain number of rows.

However, there also are many corporations that use MySQL for millions, even billions of rows of data and yet they successfully deliver excellent promise. So, why is there a contradiction between these two cases? The answer lies in understanding the intricacies of table designing in MySQL, and with the help of MySQL SQL performance tuning, how to make them work in your favour.    

What to Consider During MySQL SQL Performance Tuning

There are three major aspects of the database that can have an impact on databases with huge amounts of data. Let’s take a look at two of them now:

Buffers

The first thing to consider with any database management system is that you must have an estimate of the memory, even as you progress in terms of data accumulation. It is important for the memory to be sufficient because performance suffers greatly if it isn’t, so don’t be surprised if a drop in performance is greater than you anticipated because you may have lost track of the growth in data size and subsequently, the need for more memory space. This applies to the other aspects covered in this blog as well. Once data outgrows the memory, everything can be expected to slow down, and MySQL database and SQL becomes a necessity.

One way to ensure the memory remains sufficient for your data is to practice data partitioning. In this process, old data that is no longer required as often as recent data, is separated and stored in other servers. There are various other ways of ensuring sufficient space which we will talk about in another blog.

Indices

Indices, or indexes, are known by most of us to be a useful tool in improving the accessing speed of the database. An important thing to remember is that their usefulness depends a lot on exclusivity, i.e., the ability to select a number of rows that match with specific index ranges or values. Also, the nature of the workload – specifically whether it is cached or not – determines how much it will benefit from the use of an index.

This is actually overlooked by even MySQL optimizer at present and may need to be checked by other MySQL SQL performance tuning tools. Workload indices have a chance of much quicker access even if the size of the data being accessed is as large as fifty percent of the entire number of rows, as long as they are in-memory. On the other hand, for disk IO bound access, you may have greater success in fetching data through a full table scan irrespective of the number of rows you are requesting access to.

Since indices can differ from each other in many ways, they need to be used differently in order to effectively use them. For instance, you can place them either in a well-organized manner or at random spots, resulting in significant changes in their speed. Innodb also includes clustered keys which work by merging data and index access – such keys end up conserving IO that will prove invaluable for workloads that are entirely disk-bound.

In Conclusion

Designing table structures smartly involves taking into consideration all the abilities and disabilities of MySQL. This is especially important if you have to handle different kinds of databases in your organization.

The main reason why your organization has different databases in the first place is because of their different capabilities and shortcomings. So, the same design concepts won’t bring the same results in say, MS SQL or Oracle that they did in MySQL and vice versa. The same is true for their storage engines – each can have a different effect on the performance.

Once you have applied the right application architecture to plan your tables, you will be able to create applications that can easily handle huge data sets on the basis of MySQL.

Proper MySQL SQL performance tuning involves optimizations that can greatly boost the rate at which indices are accessed or scanned. There already are tools by Tosska Technologies Limited for this purpose like Tosska SQL Tuning Expert (TSEM™) for MySQL which you can download and start using today. Contact our team for further information or enquiries.