How to Tune SQL Statements with CONCAT Operator for MySQL?

There may be some business requirements that need to compare concatenate strings and column with a given unknown length of the bind variable. Here is an example SQL that retrieves data from EMPLOYEE and DEPARTMENT tables where employee’s department ID must concatenate two strings before it is compared to an unknown length of variable @dpt_var

select * from employee,department
where concat(concat(‘A’,emp_dept),‘B’) = @dpt_var
and  emp_dept= dpt_id

Here the following are the query plans of this SQL, it takes 23.8 seconds to finish. The query shows a “Full Table Scan Employee” to nested loop Department table.

You can see that this SQL cannot utilize index scan even the emp_dept is an indexed field. Let me add a “force index(EMPS_DPT_INX) hints to the SQL and hope it can help MySQL SQL optimizer to use index scan, but it fails to enable the index scan anyway, so I add one more dummy condition emp_dept >= ” , it is an always true condition that emp_dept should be greater or equal to a smallest empty character. It is to fool MySQL SQL optimizer that emp_dept’s index is a reasonable step.

select  *
from  employee force index(EMPS_DPT_INX),
     department
where  concat(concat(‘A’,emp_dept),‘B’) = @dpt_var
     and emp_dept >= ”
     and emp_dept = dpt_id

Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows that an Index Range Scan is used for Employee table first and then nested loop Department table.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL statement with Transitive Dependency Improvement for MySQL?

The following is an example shows a SQL statement with two conditions “emp_dept=dpt_id and emp_dept<‘L’”

select  *  from employee,department
where  emp_dept=dpt_id
  and  emp_dept<‘L’
  and  emp_id<1500000
  and  emp_salary= dpt_avg_salary
order    by  dpt_avg_salary

Here the following is the query plan of this SQL in Tosska proprietary tree format, it takes 8.84 seconds to finish.

The query plan looks reasonable that shows a full table scan of DEPARTMENT to nested-loop EMPLOYEE table, the records in EMPLOYEE table being nested-loop must satisfy with the condition “emp_id<1500000” and the corresponding index EMPS_SALARY_INX is also used. Due to the number of records in the first driving table in a Nested Loop Join is very critical to the join performance, we should find a way to narrow down the number of result records of DEPARTMENT table before it is used to nested-loop EMPLOYEE table.

As the conditions “emp_dept=dpt_id and emp_dept<‘L’”, it implies that “dpt_id < ‘L’” is also true, let me add this extra condition to the SQL, it helps MySQL SQL optimizer to make a better decision with more information provided by the new SQL syntax, this technique is especially useful for MySQL database.
Remark:
Oracle or MS SQL Server are doing very good on their internal Transitive Dependency Improvement in their SQL optimizer already, so this technique may not work for Oracle and MS SQL Server.

select      *
from        employee,
       department
where     emp_dept = dpt_id
    and dpt_id < ‘L’
    and emp_dept < ‘L’
    and emp_id < 1500000
    and emp_salary = dpt_avg_salary
order by dpt_avg_salary

Let’s see the DEPARTMENT is now being filtered by the new condition “dpt_id < ‘L’ “ with an index range scan. You can see the estimated Rows 401 of DEPARTMENT table is now being trimmed down to 176. The rewritten SQL now takes only 3.8 seconds with such a simple change in syntax.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that this rewrite is more than 2 times faster than the original SQL with such an easy change in the syntax.
https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL Statement with Driving Path Control for MySQL?

The following is an example shows a SQL statement with two potential table join paths.  “Employee to Department” and “Department to Employee” are potential driving paths which will be decided by MySQL SQL optimizer during SQL optimization stage.  

select  emp_id,emp_name,dpt_avg_salary
  from employee ,department
where emp_dept=dpt_id
and emp_dept like ‘A%’
and dpt_id like ‘A%’;

Here the following is the query plan selected by MySQL SQL optimizer in Tosska proprietary tree format, it takes 59 seconds to finish.

The query plan looks reasonable that uses DEPARTMENT’s Primary Key (DPT_ID) to fetch DEPARTMENT table first and then nested loop EMPLOYEE table by EMP_DEPT index, the speed of this query plan depends on the size of EMPLOYEE table and the records distribution according to the EMP_DEPT code.

If we want to change the driving path of the query plan from EMPLOYEE to DEPARTMENT, let me add a ifnull(dpt_id,dpt_id) dummy function to disable the DPT_ID index search, so it can artificially add cost to condition search DEPARTMENT table first. It means that using EMP_DEPT index search cost is relative cheaper now, so EMPLOYEE to DEPARTMENT driving path is probably be selected by MySQL SQL optimizer in the following:

select  emp_id,
        emp_name,
        dpt_avg_salary
from     employee,
        department
where  emp_dept = dpt_id
        and emp_dept like ‘A%’
        and ifnull(dpt_id,dpt_id) like ‘A%’

EMPLOYEE to DEPARTMENT driving path plan is generated by MySQL now and it takes only 18.8 seconds only to finish the query.

If we know that using the EMP_DEPT index is not that efficient due to the selectivity of “ like ‘A%’ “ condition may not high enough to utilize the index range scan. Let me add an additional ifnull(emp_dept,emp_dept) dummy function to disable the EMP_DEPT index range scan too in the following:

select  emp_id,
        emp_name,
        dpt_avg_salary
from     employee,
        department
where  emp_dept = dpt_id
        and ifnull(emp_dept,emp_dept) like ‘A%’
        and ifnull(dpt_id,dpt_id) like ‘A%’

Now, MySQL use full table scan of EMPLOYEE table to nested loop DEPARTMENT table. The speed is further improved to 15 seconds now.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the best rewrite is around 4 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL Statement with Multiple Union in Subquery for MySQL?

The following is an example shows a SQL statement with two union operator in a subquery. The SQL retrieve records from EMPLOYEE table that EMP_ID should satisfy with the union result set from two queries in a subquery.

select * from employee
  where emp_id IN
  (select emp_id from emp_subsidiary where emp_grade=1000
    union
    select emp_id from employee where emp_dept=‘AAA’)

Here the following are the query plans in Tosska proprietary tree format, it takes 3 minutes 27 seconds to finish.

The query plan shows a full table scan of EMPLOYEE table and the attached subquery will be executed for each of scanned record. So, you can see the query plan is very inefficient. If we know the union result set is small and it should be executed first, and then use EMP_ID index to retrieve EMPLOYEE table. Let me rewrite the Union subquery as a derived table expression in the following:

select *
from employee
where  emp_id in (select  emp_id
                       from    (select  emp_id
                             from     emp_subsidiary
                             where  emp_grade = 1000
                             union
                             select  emp_id
                             from     employee
                             where  emp_dept = ‘AAA’) DT1)

Now, you can see the Union subquery is executed first and use it to retrieve the EMPLOYEE table by EMP_ID index. The overall query is now become more reasonable and efficient.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 60 times faster than the original SQL.  There are some other rewrites with even better performance, but it is a little bit complicated to discuss in this short article, let’s discuss it in my coming blogs.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL Statement with IN List Bind Variables for MySQL?

The following is an example shows a SQL statement with a variable on the IN List operator. The SQL retrieve records from EMPLOYEE table that (EMP_ID,@1) should match any value in a set of values on the right-hand side.

select * from employee
where  (emp_id,@1) in ((1000000,’a’),(2000000,’b’),(3000000,’c’))

Here the following are the query plans in Tosska proprietary tree format, it takes 19 seconds to finish.

The query plan shows a full table scan of EMPLOYEE, it means MySQL cannot decompose the IN list syntax into a better syntax for cost evaluation and no index scan is used.

Let me rewrite the IN list into multiple OR conditions in the following:

select *
from     employee
where  (  (  emp_id = 1000000
           and @1 = ‘a’ )
           or ( emp_id = 2000000
              and @1 = ‘b’ )
           or ( emp_id = 3000000
              and @1 = ‘c’ ) )

Now, MySQL can utilize Single Row (constant) index search. The speed now is 0.0059 second and is much faster than original SQL.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3200 times faster than the original SQL.  

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune Delete SQL statement with subqueries for MySQL?

The following is an example shows a DELETE SQL statement with subquery. The SQL delete records from emp_subsidiary that should satisfy with three conditions shows in the following query.

delete from emp_subsidiary
where emp_id in
         (SELECT EMP_ID
              FROM EMPLOYEE
          WHERE emp_salary < 15000)
and emp_dept<‘D’
and emp_grade<1500

Here the following are the query plans in tabular format, it takes 8.88 seconds to finish.

Normally, DELETE SQL statements are difficult to tune due to the MySQL SQL optimizer generate a relative smaller plan space for DELETE statements compare to SELECT SQL statements. Simply speaking, there are not much alternative plans that MySQL will generate for you no matter how complicated SQL syntax you can rewrite for your DELETE statement.  But there is a loophole in MySQL version 8, which we have to aware of is the order of conditions listed in the DELETE statement. The following rewrite which reordered the filtering conditions and has the same query plan as the original SQL both in Tree Plan and Tabular Plan. But the speed is improved to 3.88 seconds.

delete from emp_subsidiary
where            emp_dept < ‘D’
                          and emp_grade < 1500

                          and emp_id in (select EMP_ID
                                 from   EMPLOYEE
                                 where  emp_salary < 15000)

Since there is no change in Tree Plan and Tabular Plan, we have to check the Visual Plan and found the following change in red box, it shows you that the Attached Condition’s execution order is changed and the time-consuming subquery is placed at the end of the Attached Condition. It means that either one of the first two conditions is false then the subquery is not necessary to execute. It is possibly can explain why the second DELETE statement is running much faster than the original SQL statement.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 2 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune CTE “WITH” SQL statement?

CTE stands for common table expression. A CTE allows you to define a temporary named result set that available temporarily in the execution scope of a statement such as SELECT, INSERT, UPDATE, DELETE.

The following shows an example of a CTE in MySQL:

WITH
    cte1 AS (SELECT a, b FROM table1),
    cte2 AS (SELECT c, d FROM table2)
SELECT b, d FROM cte1 JOIN cte2
WHERE cte1.a = cte2.c;

The following is an example shows a SQL statement with CTE WITH expression. The retrieve records from EMPLOYEE that EMP_GRADE and EMP_DEPT have to satisfy the CTE selection result.

with DT1 as
(SELECT
                     EMP_GRADE GRADE,EMP_DEPT DEPT
  FROM     DEPARTMENT,
                     EMP_SUBSIDIARY
 WHERE   DPT_ID = EMP_DEPT
         AND   DPT_AVG_SALARY<500000
         AND   EMP_DEPT<‘D’
         and     EMP_SALARY<1200000
)
select * from  EMPLOYEE where (EMP_GRADE,EMP_DEPT) in
(select GRADE,DEPT from DT1)

Here the following are the query plan of this SQL, it takes 55.9 seconds to finish. The query shows a “Subquery2” with a Nested Loop from sub_emp_salary_inx to DEPARTMENT_PK.

I found the Rows=69606 of step 1 (1 Index Range Scan –  EMP_SUBSIDIARY –  sub_emp_salary_inx) is significant high, it is not reasonable for MySQL SQL optimizer to such path from EMP_SUBSIDIARY to DEPARTMENT. I believe that MySQL optimizer cannot do a good transitivity improvement DPT_ID.  So, I manually add a new condition as “and DPT_ID<‘D’“and a “group by 1,2” to narrow down the result set from CTE.

with DT1
           as (select   EMP_GRADE GRADE,
                                   EMP_DEPT  DEPT
                   from     DEPARTMENT,
                                   EMP_SUBSIDIARY
                 where    DPT_ID = EMP_DEPT
                                   and DPT_ID < ‘D’
                                   and DPT_AVG_SALARY < 500000
                                   and EMP_DEPT < ‘D’
                                   and EMP_SALARY < 1200000
                 group by 1,
                                    2)
  select *
  from         EMPLOYEE
  where      (EMP_GRADE,EMP_DEPT) in (select  GRADE,
                      DEPT
                     from     DT1)

Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows correct driving path from DEPARTMENT to EMP_SUBSIDIARY, the estimated Rows now are closer to reality. There are two new steps of GROUP and DT1 (materialized) to narrow down the result set of CTE to future improve the performance.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 2 times faster than the original SQL. There are some other rewrites shown in this screen with comparable results too.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune “Not Exists” SQL statement?

The following is an example shows a SQL statement with “Not Exists” expression. The SQL retrieve records from emp_subsidiary that satisfy with the “Not  Exists” subquery.

select * from emp_subsidiary sub
where not (exists
(select ‘x’ from employee emp
      where emp.emp_salary<1000
           and emp.emp_dept=sub.emp_dept ))
and sub.emp_grade<1200

Here the following is the query plan of this SQL, it takes 13.36 seconds to finish. The query shows a Nested Loop from emp_subsidiary to the “Materialized Subquery2” from a full table scan of employee.

I found the Rows=2950038 of “Full Table Scan of employee” of step 2 is significantly high to constitute the materialized subquery2(view). In order to reduce the actual number of rows scan of this materialized subquery2(view). I moved the subquery of “Not Exists” to a CTE “WITH” statement and added a “group by 1” to reduce the result set of the CTE in the following.

with DT1
          as (select     emp.emp_dept
                from         employee emp
                where      emp.emp_salary < 1000
                group by 1)
    select *
    from     emp_subsidiary sub
    where  not ( sub.emp_dept in (select emp_dept
                                                                     from   DT1) )
                    and sub.emp_grade < 1200

The following is the query plan of the rewritten SQL and it takes only 2.32 seconds to complete. The new query plan shows an “Index Range Scan” to the Employee table plus a “GROUP” operation to narrow down the result set of the materialized subquery2.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 5 times faster than the original SQL. There are some other rewrites shown in this screen with even better performance, but they are more complicated in SQL syntax transformation and not suitable to discuss here in this short article.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune UPDATE SQL statement with IN subquery (I) ?

The following is an example shows an Update SQL statement with an “IN” subquery. It updates records from emp_subsidiary that satisfies the “IN” subquery conditions.

update emp_subsidiary set emp_name =‘Deleted Name’
where   emp_dept in
(select dpt_id from department
 where dpt_avg_salary<=6000);

Here the following is the query plan of this SQL, it takes 7.55 seconds to finish the update. The query shows an attached_subqueries attached to a Full Index Scan of emp_subsidiary table. It means that the 295344 rows in emp_subsidiary is going to check the subquery’s conditions one by one.

Let me rewrite the SQL into the following join update syntax.

update emp_subsidiary e1, department d1
set    e1.emp_name=‘Deleted Name’
where  e1.emp_dept = d1.dpt_id
       and d1.dpt_avg_salary <= 6000

The following is the query plan of the rewritten SQL and it takes only 1.22 seconds to complete. The new query plan shows a “Nested Loop” from Department table to Emp_subsidiary table, due to the condition “dpt_avg_salary <= 6000” has been executed before it is going to loop the Emp_subsidiary table, it saved a lot of unnecessary time to detect every record in Emp_subsidiary table.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 6 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL with OR statements?

It is common that the performance is not good if a SQL statement with OR conditions. Let’s have an example show you how to tune those SQL statements in certain situations. Here is an example SQL that extract records from EMPLOYEE table if (emp_grade < 1050 or emp_id<730000). Emp_grade and emp_id are indexed and they are not null field.

select * from employee
where emp_grade < 1050 or emp_id<730000

You can see MySQL SQL Optimizer use an Index Merge of emps_grade_inx and employee_pk to process the SQL, the performance is not good as expected since the result set is quite big for sort_union operation. It takes more than 40 seconds to finish the data retrieval. Let me rewrite the OR condition into the following UNION ALL statement, please make sure the emp_grade and emp_id are not null column, otherwise it may generate error result. The rewrite is simple that the first part extract data with emp_grade<1050, the second part of the UNION ALL retrieve records that satisfied with emp_id<730000, but it is not retrieved in the first part of the UNION ALL.

select    *
from       employee
where    emp_grade < 1050
union all
select    *
from       employee
where    not ( emp_grade < 1050 )
                  and emp_id < 730000

Here the following is the query plan of this SQL, it takes 12.46 seconds to finish. The query shows two “Index Range Scan” of EMPLOYEE_PK and EMPS_GRADE_INX to the employee table.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3 times faster than the original SQL. There are some other rewrites shown in this screen with comparable results too.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/