How to Tune CTE “WITH” SQL statement?

SQL Server database and SQL

CTE stands for common table expression. A CTE allows you to define a temporary named result set that available temporarily in the execution scope of a statement such as SELECT, INSERT, UPDATE, DELETE.

The following shows an example of a CTE in MySQL:

WITH
    cte1 AS (SELECT a, b FROM table1),
    cte2 AS (SELECT c, d FROM table2)
SELECT b, d FROM cte1 JOIN cte2
WHERE cte1.a = cte2.c;

The following is an example shows a SQL statement with CTE WITH expression. The retrieve records from EMPLOYEE that EMP_GRADE and EMP_DEPT have to satisfy the CTE selection result.

with DT1 as
(SELECT
                     EMP_GRADE GRADE,EMP_DEPT DEPT
  FROM     DEPARTMENT,
                     EMP_SUBSIDIARY
 WHERE   DPT_ID = EMP_DEPT
         AND   DPT_AVG_SALARY<500000
         AND   EMP_DEPT<‘D’
         and     EMP_SALARY<1200000
)
select * from  EMPLOYEE where (EMP_GRADE,EMP_DEPT) in
(select GRADE,DEPT from DT1)

Here the following are the query plan of this SQL, it takes 55.9 seconds to finish. The query shows a “Subquery2” with a Nested Loop from sub_emp_salary_inx to DEPARTMENT_PK.

I found the Rows=69606 of step 1 (1 Index Range Scan –  EMP_SUBSIDIARY –  sub_emp_salary_inx) is significant high, it is not reasonable for MySQL SQL optimizer to such path from EMP_SUBSIDIARY to DEPARTMENT. I believe that MySQL optimizer cannot do a good transitivity improvement DPT_ID.  So, I manually add a new condition as “and DPT_ID<‘D’“and a “group by 1,2” to narrow down the result set from CTE.

with DT1
           as (select   EMP_GRADE GRADE,
                                   EMP_DEPT  DEPT
                   from     DEPARTMENT,
                                   EMP_SUBSIDIARY
                 where    DPT_ID = EMP_DEPT
                                   and DPT_ID < ‘D’
                                   and DPT_AVG_SALARY < 500000
                                   and EMP_DEPT < ‘D’
                                   and EMP_SALARY < 1200000
                 group by 1,
                                    2)
  select *
  from         EMPLOYEE
  where      (EMP_GRADE,EMP_DEPT) in (select  GRADE,
                      DEPT
                     from     DT1)

Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows correct driving path from DEPARTMENT to EMP_SUBSIDIARY, the estimated Rows now are closer to reality. There are two new steps of GROUP and DT1 (materialized) to narrow down the result set of CTE to future improve the performance.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 2 times faster than the original SQL. There are some other rewrites shown in this screen with comparable results too.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune “Not Exists” SQL statement?

MySQL database and SQL

The following is an example shows a SQL statement with “Not Exists” expression. The SQL retrieve records from emp_subsidiary that satisfy with the “Not  Exists” subquery.

select * from emp_subsidiary sub
where not (exists
(select ‘x’ from employee emp
      where emp.emp_salary<1000
           and emp.emp_dept=sub.emp_dept ))
and sub.emp_grade<1200

Here the following is the query plan of this SQL, it takes 13.36 seconds to finish. The query shows a Nested Loop from emp_subsidiary to the “Materialized Subquery2” from a full table scan of employee.

I found the Rows=2950038 of “Full Table Scan of employee” of step 2 is significantly high to constitute the materialized subquery2(view). In order to reduce the actual number of rows scan of this materialized subquery2(view). I moved the subquery of “Not Exists” to a CTE “WITH” statement and added a “group by 1” to reduce the result set of the CTE in the following.

with DT1
          as (select     emp.emp_dept
                from         employee emp
                where      emp.emp_salary < 1000
                group by 1)
    select *
    from     emp_subsidiary sub
    where  not ( sub.emp_dept in (select emp_dept
                                                                     from   DT1) )
                    and sub.emp_grade < 1200

The following is the query plan of the rewritten SQL and it takes only 2.32 seconds to complete. The new query plan shows an “Index Range Scan” to the Employee table plus a “GROUP” operation to narrow down the result set of the materialized subquery2.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 5 times faster than the original SQL. There are some other rewrites shown in this screen with even better performance, but they are more complicated in SQL syntax transformation and not suitable to discuss here in this short article.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune UPDATE SQL statement with IN subquery (I) ?

optimization of SQL queries

The following is an example shows an Update SQL statement with an “IN” subquery. It updates records from emp_subsidiary that satisfies the “IN” subquery conditions.

update emp_subsidiary set emp_name =‘Deleted Name’
where   emp_dept in
(select dpt_id from department
 where dpt_avg_salary<=6000);

Here the following is the query plan of this SQL, it takes 7.55 seconds to finish the update. The query shows an attached_subqueries attached to a Full Index Scan of emp_subsidiary table. It means that the 295344 rows in emp_subsidiary is going to check the subquery’s conditions one by one.

Let me rewrite the SQL into the following join update syntax.

update emp_subsidiary e1, department d1
set    e1.emp_name=‘Deleted Name’
where  e1.emp_dept = d1.dpt_id
       and d1.dpt_avg_salary <= 6000

The following is the query plan of the rewritten SQL and it takes only 1.22 seconds to complete. The new query plan shows a “Nested Loop” from Department table to Emp_subsidiary table, due to the condition “dpt_avg_salary <= 6000” has been executed before it is going to loop the Emp_subsidiary table, it saved a lot of unnecessary time to detect every record in Emp_subsidiary table.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 6 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL with OR statements?

Tosska Chat Box

It is common that the performance is not good if a SQL statement with OR conditions. Let’s have an example show you how to tune those SQL statements in certain situations. Here is an example SQL that extract records from EMPLOYEE table if (emp_grade < 1050 or emp_id<730000). Emp_grade and emp_id are indexed and they are not null field.

select * from employee
where emp_grade < 1050 or emp_id<730000

You can see MySQL SQL Optimizer use an Index Merge of emps_grade_inx and employee_pk to process the SQL, the performance is not good as expected since the result set is quite big for sort_union operation. It takes more than 40 seconds to finish the data retrieval. Let me rewrite the OR condition into the following UNION ALL statement, please make sure the emp_grade and emp_id are not null column, otherwise it may generate error result. The rewrite is simple that the first part extract data with emp_grade<1050, the second part of the UNION ALL retrieve records that satisfied with emp_id<730000, but it is not retrieved in the first part of the UNION ALL.

select    *
from       employee
where    emp_grade < 1050
union all
select    *
from       employee
where    not ( emp_grade < 1050 )
                  and emp_id < 730000

Here the following is the query plan of this SQL, it takes 12.46 seconds to finish. The query shows two “Index Range Scan” of EMPLOYEE_PK and EMPS_GRADE_INX to the employee table.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3 times faster than the original SQL. There are some other rewrites shown in this screen with comparable results too.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL with COUNT(*) statements ?

sql performance monitoring

It is common that we used to count the number of records in a table.  You may encounter unexpected performance degradation in certain situations. Here is an example SQL that count number of records from EMPLOYEE table. There are number of indexes are built such as emp_id, emp_dept, emp_grade, emp_hire_date and etc….

SELECT COUNT(*)
   FROM EMPLOYEE;

You can see MySQL SQL Optimizer use a Full Index Scan of EMP_HIRE_DATE index, the performance is bad since unnecessary random reads is needed and it takes 3 minutes and 6 seconds to count a 3 million records in my computer.  I want to make use of Index Range Scan for specific index, let me rewrite the above SQL into the following syntax. If you know EMP_GRADE is indexed and it is not a nullable column, you can add a dummy condition EMP_GRADE>=’’. It fools MySQL SQL optimizer to consider using EMP_GRADE range index to retrieve the records and it is successfully generate a new plan in the following:

select    COUNT(*)
from       EMPLOYEE
where    EMP_GRADE >=  ‘ ‘

Here the following is the query plan of this SQL, it takes 2.6 seconds to finish. The query shows an “Index Range Scan” of employee table.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 71 times faster than the original SQL. There are some other rewrites shown in this screen with comparable results too.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune substr(emp_name,5,4) SQL Statement?

sql performance monitoring

There may be some business requirements that need to compare certain part of a column as a data retrieval criteria. Here is an example SQL that retrieves data from EMPLOYEE table employee’s name with a string pattern “Acco” start from 5 character of the emp_name.

select    *
  from    employee
where    substr(emp_name,5,4)=‘Acco’

Here the following are the query plans of this SQL, it takes 17 seconds to finish. The query shows a “Full Table Scan Employee”  

You can see that this SQL cannot utilize index scan even the emp_name is indexed field. Let me add a “Force Index(emp_name_inx)“ hints to the SQL and hope it can help MySQL SQL optimizer to use index scan, but it fails to enable the index scan anyway, so I add one more dummy condition emp_name >= ‘ ‘ , it is an always true condition that emp_name should be greater or equal to a smallest empty character.

select    *
from       employee force index(emp_name_inx)
where    substr(emp_name,5,4) = ‘Acco’
                 and emp_name >= ‘ ‘

Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows that an Index Range Scan is used now.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 6 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune “order by rand() limit 1” SQL Statement ?

sql performance monitoring

There may be some business requirements that needs to retrieve only some rows from a table (or join tables) randomly. This kind of SQL is normally hard to tune.

For example the following SQL retrieve one row from two tables join of Employee and Department, where Employee’s department code is ‘AAA’, and both Emp_dept and Dpt_id columns are indexed.

select  *
  from employee ,department
where emp_dept=dpt_id
and emp_dept = ‘AAA’
order by rand()
limit 1

Here the following is the query plan of this SQL, it takes 3.126 seconds to finish. The query shows a nested loop from Department table to Employee table to extract all records with ‘AAA’ department code. An Order operation is executed followed from the join result.

You can see the most expensive step is to extract all Employee data with department code “AAA”, if there is an unique key such Employee ID (EMP_ID) which can uniquely identify a row from the query, you can use the With common table expressions in MySQL version 8 to randomly select rows from the KEY column only with the same conditions given by the original query, for example the following blue colored SQL text in “With” clause, it randomly select 1 row of EMP_ID from the join query. It not only significantly reduces the size of the data retrieved from Employee table, but it also shrank the size of Order By operation. And then the main query will use the selected EMP_ID to extract specific row from original query, so the whole query will run faster with this new rewrite syntax.

with DT1
     as (select  EMP_ID
        from   employee,
         department
        where   emp_dept = dpt_id
         and emp_dept = ‘AAA’
        order by rand() limit 1
)
      select    *
      from       employee,
     department
      where    emp_dept = dpt_id
     and emp_dept = ‘AAA’
     and EMP_ID in (select EMP_ID
             from   DT1)
      order by rand() limit 1

/* Remark:  “order by rand() limit 1” is used to make sure that only 1 row will be selected if the EMP_ID cannot uniquely identify only one row */

Here is the query plan of the rewritten SQL with less cost and run much faster.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 100 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune “Order by Limit N” SQL Statement?

sql performance monitoring

There may be some business requirements that need to retrieve the first N number of rows from a join tables. Some people may encounter unexpected performance problem.

Here is an example SQL that retrieves first 1000 row from a join tables of Employee and Department in the order of dpt_id.  Where emp_dept and dpt_id columns are both indexed.

select *
     from employee ,department
where emp_dept=dpt_id
order by dpt_id
limit 1000

Here the following are the query plans of this SQL, it takes 1 mins and 8 seconds to finish. The query shows a nested loop from “Full Table Scan Employee” to “Unique Key Lookup Department” table to extract all records. An “Order by” operation of dpt_id of Department table is executed followed from this join result.

You can see the most expensive step is to extract all Employee data with Department data and then an Order by dpt_id  of Department table. Let’s see if we rewrite the original SQL text into the syntax that “order by dpt_id” is changed to “order by emp_dept”. It is a semantically eqvialent rewrite of original SQL statement since “emp_dept = dpt_id”.

select       *
from          employee,
                     department
where       emp_dept = dpt_id
order by    emp_dept
limit            1000

Here is the query plan of the rewritten SQL with less cost and run much faster. The new query plan shows that no “Use temporary; Using filesort” in Tabular Explain, it means the Nested Loop operation will be stop at the Limit 1000 records is done.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 70 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL with LIKE ‘%Mary%’ comparison?

sql performance monitoring

The LIKE is a logical operator that determines if a character string matches a specified pattern. A pattern may include regular characters and wildcard characters. The LIKE operator is used in the WHERE clause of the SELECT, UPDATE, and DELETE statements to filter rows based on pattern matching.

Here is an example SQL that retrieves data from EMPLOYEE table employee’s name with a string pattern like “Mary%”. If the emp_name is indexed, the following SQL will utilize Index Range Scan of the emp_name and the speed of the SQL will be fine.

select *
from employee
where emp_name like ‘Mary%’;

If user is looking for emp_name with pattern like ‘%Mary%’, MySQL SQL Optimizer cannot user emp_name index to speed up the process, full table scan is normally be used and the performance will be bad too.

select *
from employee
where emp_name like ‘Mary%’;

Here the following are the query plan of this SQL, it takes 18.8 seconds to finish. The query shows a “Full Table Scan” of employee table.

You can see that this SQL cannot utilize index scan even the emp_name is indexed. Let me add a “Force Index(emp_name_inx)“ hints to the SQL and hope it can help MySQL SQL optimizer to use index scan, but it fails to enable the index scan anyway, so I add one more dummy condition emp_name >= ” , it is an always true condition that emp_name should always greater or equal to a smallest empty character.

select *
from   employee force index(emp_name_inx)
where  emp_name like ‘%Mary%’
       and emp_name >= ‘ ‘;

Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows that an Index Range Scan is used now.

This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 5 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

Analyzing the Inner Works of MySQL SQL Performance Tuning and Oracle Database

When it comes to MySQL, slow performance in large tables is one of the main sources of complaints. It is true that some users face problems as their database fails to sufficiently handle a more than a certain number of rows.

However, there also are many corporations that use MySQL for millions, even billions of rows of data and yet they successfully deliver excellent promise. So, why is there a contradiction between these two cases? The answer lies in understanding the intricacies of table designing in MySQL, and with the help of MySQL SQL performance tuning, how to make them work in your favour.    

What to Consider During MySQL SQL Performance Tuning

There are three major aspects of the database that can have an impact on databases with huge amounts of data. Let’s take a look at two of them now:

Buffers

The first thing to consider with any database management system is that you must have an estimate of the memory, even as you progress in terms of data accumulation. It is important for the memory to be sufficient because performance suffers greatly if it isn’t, so don’t be surprised if a drop in performance is greater than you anticipated because you may have lost track of the growth in data size and subsequently, the need for more memory space. This applies to the other aspects covered in this blog as well. Once data outgrows the memory, everything can be expected to slow down, and MySQL database and SQL becomes a necessity.

One way to ensure the memory remains sufficient for your data is to practice data partitioning. In this process, old data that is no longer required as often as recent data, is separated and stored in other servers. There are various other ways of ensuring sufficient space which we will talk about in another blog.

Indices

Indices, or indexes, are known by most of us to be a useful tool in improving the accessing speed of the database. An important thing to remember is that their usefulness depends a lot on exclusivity, i.e., the ability to select a number of rows that match with specific index ranges or values. Also, the nature of the workload – specifically whether it is cached or not – determines how much it will benefit from the use of an index.

This is actually overlooked by even MySQL optimizer at present and may need to be checked by other MySQL SQL performance tuning tools. Workload indices have a chance of much quicker access even if the size of the data being accessed is as large as fifty percent of the entire number of rows, as long as they are in-memory. On the other hand, for disk IO bound access, you may have greater success in fetching data through a full table scan irrespective of the number of rows you are requesting access to.

Since indices can differ from each other in many ways, they need to be used differently in order to effectively use them. For instance, you can place them either in a well-organized manner or at random spots, resulting in significant changes in their speed. Innodb also includes clustered keys which work by merging data and index access – such keys end up conserving IO that will prove invaluable for workloads that are entirely disk-bound.

In Conclusion

Designing table structures smartly involves taking into consideration all the abilities and disabilities of MySQL. This is especially important if you have to handle different kinds of databases in your organization.

The main reason why your organization has different databases in the first place is because of their different capabilities and shortcomings. So, the same design concepts won’t bring the same results in say, MS SQL or Oracle that they did in MySQL and vice versa. The same is true for their storage engines – each can have a different effect on the performance.

Once you have applied the right application architecture to plan your tables, you will be able to create applications that can easily handle huge data sets on the basis of MySQL.

Proper MySQL SQL performance tuning involves optimizations that can greatly boost the rate at which indices are accessed or scanned. There already are tools by Tosska Technologies Limited for this purpose like Tosska SQL Tuning Expert (TSEM™) for MySQL which you can download and start using today. Contact our team for further information or enquiries.