How to Tune SQL Statement with IN List Bind Variables for MySQL?

dba tuning

The following is an example shows a SQL statement with a variable on the IN List operator. The SQL retrieve records from EMPLOYEE table that (EMP_ID,@1) should match any value in a set of values on the right-hand side.

select * from employee
where  (emp_id,@1) in ((1000000,’a’),(2000000,’b’),(3000000,’c’))

Here the following are the query plans in Tosska proprietary tree format, it takes 19 seconds to finish.

The query plan shows a full table scan of EMPLOYEE, it means MySQL cannot decompose the IN list syntax into a better syntax for cost evaluation and no index scan is used.

Let me rewrite the IN list into multiple OR conditions in the following:

select *
from     employee
where  (  (  emp_id = 1000000
           and @1 = ‘a’ )
           or ( emp_id = 2000000
              and @1 = ‘b’ )
           or ( emp_id = 3000000
              and @1 = ‘c’ ) )

Now, MySQL can utilize Single Row (constant) index search. The speed now is 0.0059 second and is much faster than original SQL.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3200 times faster than the original SQL.  

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune Delete SQL statement with subqueries for MySQL?

mysql query optimization

The following is an example shows a DELETE SQL statement with subquery. The SQL delete records from emp_subsidiary that should satisfy with three conditions shows in the following query.

delete from emp_subsidiary
where emp_id in
         (SELECT EMP_ID
              FROM EMPLOYEE
          WHERE emp_salary < 15000)
and emp_dept<‘D’
and emp_grade<1500

Here the following are the query plans in tabular format, it takes 8.88 seconds to finish.

Normally, DELETE SQL statements are difficult to tune due to the MySQL SQL optimizer generate a relative smaller plan space for DELETE statements compare to SELECT SQL statements. Simply speaking, there are not much alternative plans that MySQL will generate for you no matter how complicated SQL syntax you can rewrite for your DELETE statement.  But there is a loophole in MySQL version 8, which we have to aware of is the order of conditions listed in the DELETE statement. The following rewrite which reordered the filtering conditions and has the same query plan as the original SQL both in Tree Plan and Tabular Plan. But the speed is improved to 3.88 seconds.

delete from emp_subsidiary
where            emp_dept < ‘D’
                          and emp_grade < 1500

                          and emp_id in (select EMP_ID
                                 from   EMPLOYEE
                                 where  emp_salary < 15000)

Since there is no change in Tree Plan and Tabular Plan, we have to check the Visual Plan and found the following change in red box, it shows you that the Attached Condition’s execution order is changed and the time-consuming subquery is placed at the end of the Attached Condition. It means that either one of the first two conditions is false then the subquery is not necessary to execute. It is possibly can explain why the second DELETE statement is running much faster than the original SQL statement.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 2 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune CTE “WITH” SQL statement?

SQL Server database and SQL

CTE stands for common table expression. A CTE allows you to define a temporary named result set that available temporarily in the execution scope of a statement such as SELECT, INSERT, UPDATE, DELETE.

The following shows an example of a CTE in MySQL:

WITH
    cte1 AS (SELECT a, b FROM table1),
    cte2 AS (SELECT c, d FROM table2)
SELECT b, d FROM cte1 JOIN cte2
WHERE cte1.a = cte2.c;

The following is an example shows a SQL statement with CTE WITH expression. The retrieve records from EMPLOYEE that EMP_GRADE and EMP_DEPT have to satisfy the CTE selection result.

with DT1 as
(SELECT
                     EMP_GRADE GRADE,EMP_DEPT DEPT
  FROM     DEPARTMENT,
                     EMP_SUBSIDIARY
 WHERE   DPT_ID = EMP_DEPT
         AND   DPT_AVG_SALARY<500000
         AND   EMP_DEPT<‘D’
         and     EMP_SALARY<1200000
)
select * from  EMPLOYEE where (EMP_GRADE,EMP_DEPT) in
(select GRADE,DEPT from DT1)

Here the following are the query plan of this SQL, it takes 55.9 seconds to finish. The query shows a “Subquery2” with a Nested Loop from sub_emp_salary_inx to DEPARTMENT_PK.

I found the Rows=69606 of step 1 (1 Index Range Scan –  EMP_SUBSIDIARY –  sub_emp_salary_inx) is significant high, it is not reasonable for MySQL SQL optimizer to such path from EMP_SUBSIDIARY to DEPARTMENT. I believe that MySQL optimizer cannot do a good transitivity improvement DPT_ID.  So, I manually add a new condition as “and DPT_ID<‘D’“and a “group by 1,2” to narrow down the result set from CTE.

with DT1
           as (select   EMP_GRADE GRADE,
                                   EMP_DEPT  DEPT
                   from     DEPARTMENT,
                                   EMP_SUBSIDIARY
                 where    DPT_ID = EMP_DEPT
                                   and DPT_ID < ‘D’
                                   and DPT_AVG_SALARY < 500000
                                   and EMP_DEPT < ‘D’
                                   and EMP_SALARY < 1200000
                 group by 1,
                                    2)
  select *
  from         EMPLOYEE
  where      (EMP_GRADE,EMP_DEPT) in (select  GRADE,
                      DEPT
                     from     DT1)

Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows correct driving path from DEPARTMENT to EMP_SUBSIDIARY, the estimated Rows now are closer to reality. There are two new steps of GROUP and DT1 (materialized) to narrow down the result set of CTE to future improve the performance.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 2 times faster than the original SQL. There are some other rewrites shown in this screen with comparable results too.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune “Not Exists” SQL statement?

MySQL database and SQL

The following is an example shows a SQL statement with “Not Exists” expression. The SQL retrieve records from emp_subsidiary that satisfy with the “Not  Exists” subquery.

select * from emp_subsidiary sub
where not (exists
(select ‘x’ from employee emp
      where emp.emp_salary<1000
           and emp.emp_dept=sub.emp_dept ))
and sub.emp_grade<1200

Here the following is the query plan of this SQL, it takes 13.36 seconds to finish. The query shows a Nested Loop from emp_subsidiary to the “Materialized Subquery2” from a full table scan of employee.

I found the Rows=2950038 of “Full Table Scan of employee” of step 2 is significantly high to constitute the materialized subquery2(view). In order to reduce the actual number of rows scan of this materialized subquery2(view). I moved the subquery of “Not Exists” to a CTE “WITH” statement and added a “group by 1” to reduce the result set of the CTE in the following.

with DT1
          as (select     emp.emp_dept
                from         employee emp
                where      emp.emp_salary < 1000
                group by 1)
    select *
    from     emp_subsidiary sub
    where  not ( sub.emp_dept in (select emp_dept
                                                                     from   DT1) )
                    and sub.emp_grade < 1200

The following is the query plan of the rewritten SQL and it takes only 2.32 seconds to complete. The new query plan shows an “Index Range Scan” to the Employee table plus a “GROUP” operation to narrow down the result set of the materialized subquery2.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 5 times faster than the original SQL. There are some other rewrites shown in this screen with even better performance, but they are more complicated in SQL syntax transformation and not suitable to discuss here in this short article.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune UPDATE SQL statement with IN subquery (I) ?

optimization of SQL queries

The following is an example shows an Update SQL statement with an “IN” subquery. It updates records from emp_subsidiary that satisfies the “IN” subquery conditions.

update emp_subsidiary set emp_name =‘Deleted Name’
where   emp_dept in
(select dpt_id from department
 where dpt_avg_salary<=6000);

Here the following is the query plan of this SQL, it takes 7.55 seconds to finish the update. The query shows an attached_subqueries attached to a Full Index Scan of emp_subsidiary table. It means that the 295344 rows in emp_subsidiary is going to check the subquery’s conditions one by one.

Let me rewrite the SQL into the following join update syntax.

update emp_subsidiary e1, department d1
set    e1.emp_name=‘Deleted Name’
where  e1.emp_dept = d1.dpt_id
       and d1.dpt_avg_salary <= 6000

The following is the query plan of the rewritten SQL and it takes only 1.22 seconds to complete. The new query plan shows a “Nested Loop” from Department table to Emp_subsidiary table, due to the condition “dpt_avg_salary <= 6000” has been executed before it is going to loop the Emp_subsidiary table, it saved a lot of unnecessary time to detect every record in Emp_subsidiary table.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 6 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL with OR statements?

Tosska Chat Box

It is common that the performance is not good if a SQL statement with OR conditions. Let’s have an example show you how to tune those SQL statements in certain situations. Here is an example SQL that extract records from EMPLOYEE table if (emp_grade < 1050 or emp_id<730000). Emp_grade and emp_id are indexed and they are not null field.

select * from employee
where emp_grade < 1050 or emp_id<730000

You can see MySQL SQL Optimizer use an Index Merge of emps_grade_inx and employee_pk to process the SQL, the performance is not good as expected since the result set is quite big for sort_union operation. It takes more than 40 seconds to finish the data retrieval. Let me rewrite the OR condition into the following UNION ALL statement, please make sure the emp_grade and emp_id are not null column, otherwise it may generate error result. The rewrite is simple that the first part extract data with emp_grade<1050, the second part of the UNION ALL retrieve records that satisfied with emp_id<730000, but it is not retrieved in the first part of the UNION ALL.

select    *
from       employee
where    emp_grade < 1050
union all
select    *
from       employee
where    not ( emp_grade < 1050 )
                  and emp_id < 730000

Here the following is the query plan of this SQL, it takes 12.46 seconds to finish. The query shows two “Index Range Scan” of EMPLOYEE_PK and EMPS_GRADE_INX to the employee table.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3 times faster than the original SQL. There are some other rewrites shown in this screen with comparable results too.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL with COUNT(*) statements ?

sql performance monitoring

It is common that we used to count the number of records in a table.  You may encounter unexpected performance degradation in certain situations. Here is an example SQL that count number of records from EMPLOYEE table. There are number of indexes are built such as emp_id, emp_dept, emp_grade, emp_hire_date and etc….

SELECT COUNT(*)
   FROM EMPLOYEE;

You can see MySQL SQL Optimizer use a Full Index Scan of EMP_HIRE_DATE index, the performance is bad since unnecessary random reads is needed and it takes 3 minutes and 6 seconds to count a 3 million records in my computer.  I want to make use of Index Range Scan for specific index, let me rewrite the above SQL into the following syntax. If you know EMP_GRADE is indexed and it is not a nullable column, you can add a dummy condition EMP_GRADE>=’’. It fools MySQL SQL optimizer to consider using EMP_GRADE range index to retrieve the records and it is successfully generate a new plan in the following:

select    COUNT(*)
from       EMPLOYEE
where    EMP_GRADE >=  ‘ ‘

Here the following is the query plan of this SQL, it takes 2.6 seconds to finish. The query shows an “Index Range Scan” of employee table.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 71 times faster than the original SQL. There are some other rewrites shown in this screen with comparable results too.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune substr(emp_name,5,4) SQL Statement?

sql performance monitoring

There may be some business requirements that need to compare certain part of a column as a data retrieval criteria. Here is an example SQL that retrieves data from EMPLOYEE table employee’s name with a string pattern “Acco” start from 5 character of the emp_name.

select    *
  from    employee
where    substr(emp_name,5,4)=‘Acco’

Here the following are the query plans of this SQL, it takes 17 seconds to finish. The query shows a “Full Table Scan Employee”  

You can see that this SQL cannot utilize index scan even the emp_name is indexed field. Let me add a “Force Index(emp_name_inx)“ hints to the SQL and hope it can help MySQL SQL optimizer to use index scan, but it fails to enable the index scan anyway, so I add one more dummy condition emp_name >= ‘ ‘ , it is an always true condition that emp_name should be greater or equal to a smallest empty character.

select    *
from       employee force index(emp_name_inx)
where    substr(emp_name,5,4) = ‘Acco’
                 and emp_name >= ‘ ‘

Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows that an Index Range Scan is used now.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 6 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune “order by rand() limit 1” SQL Statement ?

sql performance monitoring

There may be some business requirements that needs to retrieve only some rows from a table (or join tables) randomly. This kind of SQL is normally hard to tune.

For example the following SQL retrieve one row from two tables join of Employee and Department, where Employee’s department code is ‘AAA’, and both Emp_dept and Dpt_id columns are indexed.

select  *
  from employee ,department
where emp_dept=dpt_id
and emp_dept = ‘AAA’
order by rand()
limit 1

Here the following is the query plan of this SQL, it takes 3.126 seconds to finish. The query shows a nested loop from Department table to Employee table to extract all records with ‘AAA’ department code. An Order operation is executed followed from the join result.

You can see the most expensive step is to extract all Employee data with department code “AAA”, if there is an unique key such Employee ID (EMP_ID) which can uniquely identify a row from the query, you can use the With common table expressions in MySQL version 8 to randomly select rows from the KEY column only with the same conditions given by the original query, for example the following blue colored SQL text in “With” clause, it randomly select 1 row of EMP_ID from the join query. It not only significantly reduces the size of the data retrieved from Employee table, but it also shrank the size of Order By operation. And then the main query will use the selected EMP_ID to extract specific row from original query, so the whole query will run faster with this new rewrite syntax.

with DT1
     as (select  EMP_ID
        from   employee,
         department
        where   emp_dept = dpt_id
         and emp_dept = ‘AAA’
        order by rand() limit 1
)
      select    *
      from       employee,
     department
      where    emp_dept = dpt_id
     and emp_dept = ‘AAA’
     and EMP_ID in (select EMP_ID
             from   DT1)
      order by rand() limit 1

/* Remark:  “order by rand() limit 1” is used to make sure that only 1 row will be selected if the EMP_ID cannot uniquely identify only one row */

Here is the query plan of the rewritten SQL with less cost and run much faster.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 100 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune “Order by Limit N” SQL Statement?

sql performance monitoring

There may be some business requirements that need to retrieve the first N number of rows from a join tables. Some people may encounter unexpected performance problem.

Here is an example SQL that retrieves first 1000 row from a join tables of Employee and Department in the order of dpt_id.  Where emp_dept and dpt_id columns are both indexed.

select *
     from employee ,department
where emp_dept=dpt_id
order by dpt_id
limit 1000

Here the following are the query plans of this SQL, it takes 1 mins and 8 seconds to finish. The query shows a nested loop from “Full Table Scan Employee” to “Unique Key Lookup Department” table to extract all records. An “Order by” operation of dpt_id of Department table is executed followed from this join result.

You can see the most expensive step is to extract all Employee data with Department data and then an Order by dpt_id  of Department table. Let’s see if we rewrite the original SQL text into the syntax that “order by dpt_id” is changed to “order by emp_dept”. It is a semantically eqvialent rewrite of original SQL statement since “emp_dept = dpt_id”.

select       *
from          employee,
                     department
where       emp_dept = dpt_id
order by    emp_dept
limit            1000

Here is the query plan of the rewritten SQL with less cost and run much faster. The new query plan shows that no “Use temporary; Using filesort” in Tabular Explain, it means the Nested Loop operation will be stop at the Limit 1000 records is done.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 70 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/