How to Tune SQL Statements with Rewrite and Hints Injection for MySQL?

sql tuning for MySQL

There are some SQL statements with performance problem have to be tuned by SQL syntax rewrite and Hints injection, it is a little bit difficult for SQL tuning newcomers to master this technique. Developers not only have to understand the relationship between SQL syntax and the final query plan generation but have to understand the usage of optimizer hints and its limitations. Sometimes these two tuning techniques application will affect each other in a complex SQL statement.

Here is a simple example SQL that retrieves data from EMPLOYEE and DEPARTMENT tables.

select  * from employee,department
where emp_dept=dpt_id
   and emp_dept<‘L’
   and emp_id<1500000
   and emp_salary= dpt_avg_salary
order by dpt_avg_salary

Here the following are the query plans of this SQL, it takes 7.7 seconds to finish. The query shows a “Full Table Scan Department” and nested loop Employee table with a Non-Unique Key Lookup EMPS_SALARY_INX.

You can see that this SQL cannot utilize index scan even though the dpt_dept is an indexed field. It is because the condition emp_dept<‘L’ is not explicitly induced the condition dpt_id < ‘L’ although emp_dept=dpt_id is also listed in the where clause.

To enable the index search of Department table, I explicitly add a condition dpt_id < ‘L’ to the SQL statement as the following:

select   *
from  employee,
     department
where  emp_dept = dpt_id
     and dpt_id < ‘L’
     and emp_dept < ‘L’
     and emp_id < 1500000
     and emp_salary = dpt_avg_salary
order by  dpt_avg_salary

Here is the query plan of the rewritten SQL and the execution time is reduced to 3.4 seconds. The new query plan shows that an Index Range Scan is used for the Department table and nested loop Employee table.

You may find that the nested loop to Employee by EMPS_SALARY_INX lookup may result into a lot of random access to the Employee table. Let me add a BKA hint to ask MySQL to use ‘Batched Key Access’ to join the two tables.

select   /*+ QB_NAME(QB1) BKA(`employee`@QB1) */ *
from  employee,
     department
where  emp_dept = dpt_id
     and dpt_id < ‘L’
     and emp_dept < ‘L’
     and emp_id < 1500000
     and emp_salary = dpt_avg_salary
order by  dpt_avg_salary

The new query plan shows a Batched Key Access is used to join Department and Employee tables, you can BAK information from MySQL manual for details, the new plan takes only 1.99 seconds and it is more than 3 times better than the original SQL syntax.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL Statements to Run SLOWER… but Make Users Feel BETTER (MySQL)?

MySQL database and SQL

Your end-users may keep on complaining about some functions of their database application are running slow, but you may found that those SQL statements are already reached their maximum speed in the current MySQL and hardware configuration. There may be no way to improve the SQL unless you are willing to upgrade your hardware. To make your users feel better, sometimes, you don’t have to tune your SQL to run faster but to tune your SQL to run slower for certain application’s SQL statements.

This is an example SQL that is used to display the information from tables Emp_subsidiary and Employee if they are satisfied with certain criteria. This SQL is executed as an online query and users have to wait for at least 5 seconds before any data will be shown on screen after the mouse click.

select  *
from    employee a,
         emp_subsidiary b
where   a.emp_id = b.emp_id
         and a.emp_grade < 1050
         and b.emp_salary < 5000000
order by a.emp_id

Here the following is the query plan and execution statistics of the SQL, it takes 5.48seconds to extract all 3645 records and the first records return time ”Response Time(Duration)” is 5.39 seconds. The query shows a “Full Table Scan b (emp_subsidiary)” to Nested-Loop “a (employee)” table, an ORDER operation is followed by sorting the returned data by emp_id. You can see there is a Sort Cost=7861.86 at the ORDER step on the query plan. It is the reason that users have to wait at least 5 seconds before they can see anything shows on the screen.

To reduce the sorting time of a.emp_id, since a.emp_id=b.emp_id, so I can rewrite the order by clause from “order by a.emp_id” to “order by b.emp_id”, MySQL now can eliminate the sorting time by using the EMPLOYEE_PK after the nested loop operation.

select  *
from    employee a,
         emp_subsidiary b
where   a.emp_id = b.emp_id
         and a.emp_grade < 1050
         and b.emp_salary < 5000000
order by b.emp_id

Although the overall Elapsed Time is higher in the new query plan, you can see that the response time is reduced from 5.397 seconds to 0.068, so the users can see the first page of information on the screen instantly and they don’t care whether there are 2 more seconds for all 3,645 records to be returned. That is why SQL tuning is an art rather than science when you are going to manage your users’ expectations.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL Statements with CONCAT Operator for MySQL?

oracle sql performance tuning

There may be some business requirements that need to compare concatenate strings and column with a given unknown length of the bind variable. Here is an example SQL that retrieves data from EMPLOYEE and DEPARTMENT tables where employee’s department ID must concatenate two strings before it is compared to an unknown length of variable @dpt_var

select * from employee,department
where concat(concat(‘A’,emp_dept),‘B’) = @dpt_var
and  emp_dept= dpt_id

Here the following are the query plans of this SQL, it takes 23.8 seconds to finish. The query shows a “Full Table Scan Employee” to nested loop Department table.

You can see that this SQL cannot utilize index scan even the emp_dept is an indexed field. Let me add a “force index(EMPS_DPT_INX) hints to the SQL and hope it can help MySQL SQL optimizer to use index scan, but it fails to enable the index scan anyway, so I add one more dummy condition emp_dept >= ” , it is an always true condition that emp_dept should be greater or equal to a smallest empty character. It is to fool MySQL SQL optimizer that emp_dept’s index is a reasonable step.

select  *
from  employee force index(EMPS_DPT_INX),
     department
where  concat(concat(‘A’,emp_dept),‘B’) = @dpt_var
     and emp_dept >= ”
     and emp_dept = dpt_id

Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows that an Index Range Scan is used for Employee table first and then nested loop Department table.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL statement with Transitive Dependency Improvement for MySQL?

oracle query optimizer too

The following is an example shows a SQL statement with two conditions “emp_dept=dpt_id and emp_dept<‘L’”

select  *  from employee,department
where  emp_dept=dpt_id
  and  emp_dept<‘L’
  and  emp_id<1500000
  and  emp_salary= dpt_avg_salary
order    by  dpt_avg_salary

Here the following is the query plan of this SQL in Tosska proprietary tree format, it takes 8.84 seconds to finish.

The query plan looks reasonable that shows a full table scan of DEPARTMENT to nested-loop EMPLOYEE table, the records in EMPLOYEE table being nested-loop must satisfy with the condition “emp_id<1500000” and the corresponding index EMPS_SALARY_INX is also used. Due to the number of records in the first driving table in a Nested Loop Join is very critical to the join performance, we should find a way to narrow down the number of result records of DEPARTMENT table before it is used to nested-loop EMPLOYEE table.

As the conditions “emp_dept=dpt_id and emp_dept<‘L’”, it implies that “dpt_id < ‘L’” is also true, let me add this extra condition to the SQL, it helps MySQL SQL optimizer to make a better decision with more information provided by the new SQL syntax, this technique is especially useful for MySQL database.
Remark:
Oracle or MS SQL Server are doing very good on their internal Transitive Dependency Improvement in their SQL optimizer already, so this technique may not work for Oracle and MS SQL Server.

select      *
from        employee,
       department
where     emp_dept = dpt_id
    and dpt_id < ‘L’
    and emp_dept < ‘L’
    and emp_id < 1500000
    and emp_salary = dpt_avg_salary
order by dpt_avg_salary

Let’s see the DEPARTMENT is now being filtered by the new condition “dpt_id < ‘L’ “ with an index range scan. You can see the estimated Rows 401 of DEPARTMENT table is now being trimmed down to 176. The rewritten SQL now takes only 3.8 seconds with such a simple change in syntax.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that this rewrite is more than 2 times faster than the original SQL with such an easy change in the syntax.
https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL Statement with Driving Path Control for MySQL?

sql tuning for MySQL

The following is an example shows a SQL statement with two potential table join paths.  “Employee to Department” and “Department to Employee” are potential driving paths which will be decided by MySQL SQL optimizer during SQL optimization stage.  

select  emp_id,emp_name,dpt_avg_salary
  from employee ,department
where emp_dept=dpt_id
and emp_dept like ‘A%’
and dpt_id like ‘A%’;

Here the following is the query plan selected by MySQL SQL optimizer in Tosska proprietary tree format, it takes 59 seconds to finish.

The query plan looks reasonable that uses DEPARTMENT’s Primary Key (DPT_ID) to fetch DEPARTMENT table first and then nested loop EMPLOYEE table by EMP_DEPT index, the speed of this query plan depends on the size of EMPLOYEE table and the records distribution according to the EMP_DEPT code.

If we want to change the driving path of the query plan from EMPLOYEE to DEPARTMENT, let me add a ifnull(dpt_id,dpt_id) dummy function to disable the DPT_ID index search, so it can artificially add cost to condition search DEPARTMENT table first. It means that using EMP_DEPT index search cost is relative cheaper now, so EMPLOYEE to DEPARTMENT driving path is probably be selected by MySQL SQL optimizer in the following:

select  emp_id,
        emp_name,
        dpt_avg_salary
from     employee,
        department
where  emp_dept = dpt_id
        and emp_dept like ‘A%’
        and ifnull(dpt_id,dpt_id) like ‘A%’

EMPLOYEE to DEPARTMENT driving path plan is generated by MySQL now and it takes only 18.8 seconds only to finish the query.

If we know that using the EMP_DEPT index is not that efficient due to the selectivity of “ like ‘A%’ “ condition may not high enough to utilize the index range scan. Let me add an additional ifnull(emp_dept,emp_dept) dummy function to disable the EMP_DEPT index range scan too in the following:

select  emp_id,
        emp_name,
        dpt_avg_salary
from     employee,
        department
where  emp_dept = dpt_id
        and ifnull(emp_dept,emp_dept) like ‘A%’
        and ifnull(dpt_id,dpt_id) like ‘A%’

Now, MySQL use full table scan of EMPLOYEE table to nested loop DEPARTMENT table. The speed is further improved to 15 seconds now.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the best rewrite is around 4 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL Statement with Multiple Union in Subquery for MySQL?

dba tuning

The following is an example shows a SQL statement with two union operator in a subquery. The SQL retrieve records from EMPLOYEE table that EMP_ID should satisfy with the union result set from two queries in a subquery.

select * from employee
  where emp_id IN
  (select emp_id from emp_subsidiary where emp_grade=1000
    union
    select emp_id from employee where emp_dept=‘AAA’)

Here the following are the query plans in Tosska proprietary tree format, it takes 3 minutes 27 seconds to finish.

The query plan shows a full table scan of EMPLOYEE table and the attached subquery will be executed for each of scanned record. So, you can see the query plan is very inefficient. If we know the union result set is small and it should be executed first, and then use EMP_ID index to retrieve EMPLOYEE table. Let me rewrite the Union subquery as a derived table expression in the following:

select *
from employee
where  emp_id in (select  emp_id
                       from    (select  emp_id
                             from     emp_subsidiary
                             where  emp_grade = 1000
                             union
                             select  emp_id
                             from     employee
                             where  emp_dept = ‘AAA’) DT1)

Now, you can see the Union subquery is executed first and use it to retrieve the EMPLOYEE table by EMP_ID index. The overall query is now become more reasonable and efficient.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 60 times faster than the original SQL.  There are some other rewrites with even better performance, but it is a little bit complicated to discuss in this short article, let’s discuss it in my coming blogs.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL Statement with IN List Bind Variables for MySQL?

dba tuning

The following is an example shows a SQL statement with a variable on the IN List operator. The SQL retrieve records from EMPLOYEE table that (EMP_ID,@1) should match any value in a set of values on the right-hand side.

select * from employee
where  (emp_id,@1) in ((1000000,’a’),(2000000,’b’),(3000000,’c’))

Here the following are the query plans in Tosska proprietary tree format, it takes 19 seconds to finish.

The query plan shows a full table scan of EMPLOYEE, it means MySQL cannot decompose the IN list syntax into a better syntax for cost evaluation and no index scan is used.

Let me rewrite the IN list into multiple OR conditions in the following:

select *
from     employee
where  (  (  emp_id = 1000000
           and @1 = ‘a’ )
           or ( emp_id = 2000000
              and @1 = ‘b’ )
           or ( emp_id = 3000000
              and @1 = ‘c’ ) )

Now, MySQL can utilize Single Row (constant) index search. The speed now is 0.0059 second and is much faster than original SQL.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3200 times faster than the original SQL.  

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune Delete SQL statement with subqueries for MySQL?

mysql query optimization

The following is an example shows a DELETE SQL statement with subquery. The SQL delete records from emp_subsidiary that should satisfy with three conditions shows in the following query.

delete from emp_subsidiary
where emp_id in
         (SELECT EMP_ID
              FROM EMPLOYEE
          WHERE emp_salary < 15000)
and emp_dept<‘D’
and emp_grade<1500

Here the following are the query plans in tabular format, it takes 8.88 seconds to finish.

Normally, DELETE SQL statements are difficult to tune due to the MySQL SQL optimizer generate a relative smaller plan space for DELETE statements compare to SELECT SQL statements. Simply speaking, there are not much alternative plans that MySQL will generate for you no matter how complicated SQL syntax you can rewrite for your DELETE statement.  But there is a loophole in MySQL version 8, which we have to aware of is the order of conditions listed in the DELETE statement. The following rewrite which reordered the filtering conditions and has the same query plan as the original SQL both in Tree Plan and Tabular Plan. But the speed is improved to 3.88 seconds.

delete from emp_subsidiary
where            emp_dept < ‘D’
                          and emp_grade < 1500

                          and emp_id in (select EMP_ID
                                 from   EMPLOYEE
                                 where  emp_salary < 15000)

Since there is no change in Tree Plan and Tabular Plan, we have to check the Visual Plan and found the following change in red box, it shows you that the Attached Condition’s execution order is changed and the time-consuming subquery is placed at the end of the Attached Condition. It means that either one of the first two conditions is false then the subquery is not necessary to execute. It is possibly can explain why the second DELETE statement is running much faster than the original SQL statement.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 2 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/