How to Tune SQL Statement with Driving Path Control for MySQL?

sql tuning for MySQL

The following is an example shows a SQL statement with two potential table join paths.  “Employee to Department” and “Department to Employee” are potential driving paths which will be decided by MySQL SQL optimizer during SQL optimization stage.  

select  emp_id,emp_name,dpt_avg_salary
  from employee ,department
where emp_dept=dpt_id
and emp_dept like ‘A%’
and dpt_id like ‘A%’;

Here the following is the query plan selected by MySQL SQL optimizer in Tosska proprietary tree format, it takes 59 seconds to finish.

The query plan looks reasonable that uses DEPARTMENT’s Primary Key (DPT_ID) to fetch DEPARTMENT table first and then nested loop EMPLOYEE table by EMP_DEPT index, the speed of this query plan depends on the size of EMPLOYEE table and the records distribution according to the EMP_DEPT code.

If we want to change the driving path of the query plan from EMPLOYEE to DEPARTMENT, let me add a ifnull(dpt_id,dpt_id) dummy function to disable the DPT_ID index search, so it can artificially add cost to condition search DEPARTMENT table first. It means that using EMP_DEPT index search cost is relative cheaper now, so EMPLOYEE to DEPARTMENT driving path is probably be selected by MySQL SQL optimizer in the following:

select  emp_id,
        emp_name,
        dpt_avg_salary
from     employee,
        department
where  emp_dept = dpt_id
        and emp_dept like ‘A%’
        and ifnull(dpt_id,dpt_id) like ‘A%’

EMPLOYEE to DEPARTMENT driving path plan is generated by MySQL now and it takes only 18.8 seconds only to finish the query.

If we know that using the EMP_DEPT index is not that efficient due to the selectivity of “ like ‘A%’ “ condition may not high enough to utilize the index range scan. Let me add an additional ifnull(emp_dept,emp_dept) dummy function to disable the EMP_DEPT index range scan too in the following:

select  emp_id,
        emp_name,
        dpt_avg_salary
from     employee,
        department
where  emp_dept = dpt_id
        and ifnull(emp_dept,emp_dept) like ‘A%’
        and ifnull(dpt_id,dpt_id) like ‘A%’

Now, MySQL use full table scan of EMPLOYEE table to nested loop DEPARTMENT table. The speed is further improved to 15 seconds now.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the best rewrite is around 4 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL Statement with Multiple Union in Subquery for MySQL?

dba tuning

The following is an example shows a SQL statement with two union operator in a subquery. The SQL retrieve records from EMPLOYEE table that EMP_ID should satisfy with the union result set from two queries in a subquery.

select * from employee
  where emp_id IN
  (select emp_id from emp_subsidiary where emp_grade=1000
    union
    select emp_id from employee where emp_dept=‘AAA’)

Here the following are the query plans in Tosska proprietary tree format, it takes 3 minutes 27 seconds to finish.

The query plan shows a full table scan of EMPLOYEE table and the attached subquery will be executed for each of scanned record. So, you can see the query plan is very inefficient. If we know the union result set is small and it should be executed first, and then use EMP_ID index to retrieve EMPLOYEE table. Let me rewrite the Union subquery as a derived table expression in the following:

select *
from employee
where  emp_id in (select  emp_id
                       from    (select  emp_id
                             from     emp_subsidiary
                             where  emp_grade = 1000
                             union
                             select  emp_id
                             from     employee
                             where  emp_dept = ‘AAA’) DT1)

Now, you can see the Union subquery is executed first and use it to retrieve the EMPLOYEE table by EMP_ID index. The overall query is now become more reasonable and efficient.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 60 times faster than the original SQL.  There are some other rewrites with even better performance, but it is a little bit complicated to discuss in this short article, let’s discuss it in my coming blogs.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL Statement with IN List Bind Variables for MySQL?

dba tuning

The following is an example shows a SQL statement with a variable on the IN List operator. The SQL retrieve records from EMPLOYEE table that (EMP_ID,@1) should match any value in a set of values on the right-hand side.

select * from employee
where  (emp_id,@1) in ((1000000,’a’),(2000000,’b’),(3000000,’c’))

Here the following are the query plans in Tosska proprietary tree format, it takes 19 seconds to finish.

The query plan shows a full table scan of EMPLOYEE, it means MySQL cannot decompose the IN list syntax into a better syntax for cost evaluation and no index scan is used.

Let me rewrite the IN list into multiple OR conditions in the following:

select *
from     employee
where  (  (  emp_id = 1000000
           and @1 = ‘a’ )
           or ( emp_id = 2000000
              and @1 = ‘b’ )
           or ( emp_id = 3000000
              and @1 = ‘c’ ) )

Now, MySQL can utilize Single Row (constant) index search. The speed now is 0.0059 second and is much faster than original SQL.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3200 times faster than the original SQL.  

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune Delete SQL statement with subqueries for MySQL?

mysql query optimization

The following is an example shows a DELETE SQL statement with subquery. The SQL delete records from emp_subsidiary that should satisfy with three conditions shows in the following query.

delete from emp_subsidiary
where emp_id in
         (SELECT EMP_ID
              FROM EMPLOYEE
          WHERE emp_salary < 15000)
and emp_dept<‘D’
and emp_grade<1500

Here the following are the query plans in tabular format, it takes 8.88 seconds to finish.

Normally, DELETE SQL statements are difficult to tune due to the MySQL SQL optimizer generate a relative smaller plan space for DELETE statements compare to SELECT SQL statements. Simply speaking, there are not much alternative plans that MySQL will generate for you no matter how complicated SQL syntax you can rewrite for your DELETE statement.  But there is a loophole in MySQL version 8, which we have to aware of is the order of conditions listed in the DELETE statement. The following rewrite which reordered the filtering conditions and has the same query plan as the original SQL both in Tree Plan and Tabular Plan. But the speed is improved to 3.88 seconds.

delete from emp_subsidiary
where            emp_dept < ‘D’
                          and emp_grade < 1500

                          and emp_id in (select EMP_ID
                                 from   EMPLOYEE
                                 where  emp_salary < 15000)

Since there is no change in Tree Plan and Tabular Plan, we have to check the Visual Plan and found the following change in red box, it shows you that the Attached Condition’s execution order is changed and the time-consuming subquery is placed at the end of the Attached Condition. It means that either one of the first two conditions is false then the subquery is not necessary to execute. It is possibly can explain why the second DELETE statement is running much faster than the original SQL statement.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 2 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune CTE “WITH” SQL statement?

SQL Server database and SQL

CTE stands for common table expression. A CTE allows you to define a temporary named result set that available temporarily in the execution scope of a statement such as SELECT, INSERT, UPDATE, DELETE.

The following shows an example of a CTE in MySQL:

WITH
    cte1 AS (SELECT a, b FROM table1),
    cte2 AS (SELECT c, d FROM table2)
SELECT b, d FROM cte1 JOIN cte2
WHERE cte1.a = cte2.c;

The following is an example shows a SQL statement with CTE WITH expression. The retrieve records from EMPLOYEE that EMP_GRADE and EMP_DEPT have to satisfy the CTE selection result.

with DT1 as
(SELECT
                     EMP_GRADE GRADE,EMP_DEPT DEPT
  FROM     DEPARTMENT,
                     EMP_SUBSIDIARY
 WHERE   DPT_ID = EMP_DEPT
         AND   DPT_AVG_SALARY<500000
         AND   EMP_DEPT<‘D’
         and     EMP_SALARY<1200000
)
select * from  EMPLOYEE where (EMP_GRADE,EMP_DEPT) in
(select GRADE,DEPT from DT1)

Here the following are the query plan of this SQL, it takes 55.9 seconds to finish. The query shows a “Subquery2” with a Nested Loop from sub_emp_salary_inx to DEPARTMENT_PK.

I found the Rows=69606 of step 1 (1 Index Range Scan –  EMP_SUBSIDIARY –  sub_emp_salary_inx) is significant high, it is not reasonable for MySQL SQL optimizer to such path from EMP_SUBSIDIARY to DEPARTMENT. I believe that MySQL optimizer cannot do a good transitivity improvement DPT_ID.  So, I manually add a new condition as “and DPT_ID<‘D’“and a “group by 1,2” to narrow down the result set from CTE.

with DT1
           as (select   EMP_GRADE GRADE,
                                   EMP_DEPT  DEPT
                   from     DEPARTMENT,
                                   EMP_SUBSIDIARY
                 where    DPT_ID = EMP_DEPT
                                   and DPT_ID < ‘D’
                                   and DPT_AVG_SALARY < 500000
                                   and EMP_DEPT < ‘D’
                                   and EMP_SALARY < 1200000
                 group by 1,
                                    2)
  select *
  from         EMPLOYEE
  where      (EMP_GRADE,EMP_DEPT) in (select  GRADE,
                      DEPT
                     from     DT1)

Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows correct driving path from DEPARTMENT to EMP_SUBSIDIARY, the estimated Rows now are closer to reality. There are two new steps of GROUP and DT1 (materialized) to narrow down the result set of CTE to future improve the performance.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 2 times faster than the original SQL. There are some other rewrites shown in this screen with comparable results too.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune “Not Exists” SQL statement?

MySQL database and SQL

The following is an example shows a SQL statement with “Not Exists” expression. The SQL retrieve records from emp_subsidiary that satisfy with the “Not  Exists” subquery.

select * from emp_subsidiary sub
where not (exists
(select ‘x’ from employee emp
      where emp.emp_salary<1000
           and emp.emp_dept=sub.emp_dept ))
and sub.emp_grade<1200

Here the following is the query plan of this SQL, it takes 13.36 seconds to finish. The query shows a Nested Loop from emp_subsidiary to the “Materialized Subquery2” from a full table scan of employee.

I found the Rows=2950038 of “Full Table Scan of employee” of step 2 is significantly high to constitute the materialized subquery2(view). In order to reduce the actual number of rows scan of this materialized subquery2(view). I moved the subquery of “Not Exists” to a CTE “WITH” statement and added a “group by 1” to reduce the result set of the CTE in the following.

with DT1
          as (select     emp.emp_dept
                from         employee emp
                where      emp.emp_salary < 1000
                group by 1)
    select *
    from     emp_subsidiary sub
    where  not ( sub.emp_dept in (select emp_dept
                                                                     from   DT1) )
                    and sub.emp_grade < 1200

The following is the query plan of the rewritten SQL and it takes only 2.32 seconds to complete. The new query plan shows an “Index Range Scan” to the Employee table plus a “GROUP” operation to narrow down the result set of the materialized subquery2.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 5 times faster than the original SQL. There are some other rewrites shown in this screen with even better performance, but they are more complicated in SQL syntax transformation and not suitable to discuss here in this short article.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune UPDATE SQL statement with IN subquery (I) ?

optimization of SQL queries

The following is an example shows an Update SQL statement with an “IN” subquery. It updates records from emp_subsidiary that satisfies the “IN” subquery conditions.

update emp_subsidiary set emp_name =‘Deleted Name’
where   emp_dept in
(select dpt_id from department
 where dpt_avg_salary<=6000);

Here the following is the query plan of this SQL, it takes 7.55 seconds to finish the update. The query shows an attached_subqueries attached to a Full Index Scan of emp_subsidiary table. It means that the 295344 rows in emp_subsidiary is going to check the subquery’s conditions one by one.

Let me rewrite the SQL into the following join update syntax.

update emp_subsidiary e1, department d1
set    e1.emp_name=‘Deleted Name’
where  e1.emp_dept = d1.dpt_id
       and d1.dpt_avg_salary <= 6000

The following is the query plan of the rewritten SQL and it takes only 1.22 seconds to complete. The new query plan shows a “Nested Loop” from Department table to Emp_subsidiary table, due to the condition “dpt_avg_salary <= 6000” has been executed before it is going to loop the Emp_subsidiary table, it saved a lot of unnecessary time to detect every record in Emp_subsidiary table.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 6 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL with OR statements?

Tosska Chat Box

It is common that the performance is not good if a SQL statement with OR conditions. Let’s have an example show you how to tune those SQL statements in certain situations. Here is an example SQL that extract records from EMPLOYEE table if (emp_grade < 1050 or emp_id<730000). Emp_grade and emp_id are indexed and they are not null field.

select * from employee
where emp_grade < 1050 or emp_id<730000

You can see MySQL SQL Optimizer use an Index Merge of emps_grade_inx and employee_pk to process the SQL, the performance is not good as expected since the result set is quite big for sort_union operation. It takes more than 40 seconds to finish the data retrieval. Let me rewrite the OR condition into the following UNION ALL statement, please make sure the emp_grade and emp_id are not null column, otherwise it may generate error result. The rewrite is simple that the first part extract data with emp_grade<1050, the second part of the UNION ALL retrieve records that satisfied with emp_id<730000, but it is not retrieved in the first part of the UNION ALL.

select    *
from       employee
where    emp_grade < 1050
union all
select    *
from       employee
where    not ( emp_grade < 1050 )
                  and emp_id < 730000

Here the following is the query plan of this SQL, it takes 12.46 seconds to finish. The query shows two “Index Range Scan” of EMPLOYEE_PK and EMPS_GRADE_INX to the employee table.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3 times faster than the original SQL. There are some other rewrites shown in this screen with comparable results too.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/