How to Tune SQL Statements with Rewrite and Hints Injection for MySQL?

sql tuning for MySQL

There are some SQL statements with performance problem have to be tuned by SQL syntax rewrite and Hints injection, it is a little bit difficult for SQL tuning newcomers to master this technique. Developers not only have to understand the relationship between SQL syntax and the final query plan generation but have to understand the usage of optimizer hints and its limitations. Sometimes these two tuning techniques application will affect each other in a complex SQL statement.

Here is a simple example SQL that retrieves data from EMPLOYEE and DEPARTMENT tables.

select  * from employee,department
where emp_dept=dpt_id
   and emp_dept<‘L’
   and emp_id<1500000
   and emp_salary= dpt_avg_salary
order by dpt_avg_salary

Here the following are the query plans of this SQL, it takes 7.7 seconds to finish. The query shows a “Full Table Scan Department” and nested loop Employee table with a Non-Unique Key Lookup EMPS_SALARY_INX.

You can see that this SQL cannot utilize index scan even though the dpt_dept is an indexed field. It is because the condition emp_dept<‘L’ is not explicitly induced the condition dpt_id < ‘L’ although emp_dept=dpt_id is also listed in the where clause.

To enable the index search of Department table, I explicitly add a condition dpt_id < ‘L’ to the SQL statement as the following:

select   *
from  employee,
     department
where  emp_dept = dpt_id
     and dpt_id < ‘L’
     and emp_dept < ‘L’
     and emp_id < 1500000
     and emp_salary = dpt_avg_salary
order by  dpt_avg_salary

Here is the query plan of the rewritten SQL and the execution time is reduced to 3.4 seconds. The new query plan shows that an Index Range Scan is used for the Department table and nested loop Employee table.

You may find that the nested loop to Employee by EMPS_SALARY_INX lookup may result into a lot of random access to the Employee table. Let me add a BKA hint to ask MySQL to use ‘Batched Key Access’ to join the two tables.

select   /*+ QB_NAME(QB1) BKA(`employee`@QB1) */ *
from  employee,
     department
where  emp_dept = dpt_id
     and dpt_id < ‘L’
     and emp_dept < ‘L’
     and emp_id < 1500000
     and emp_salary = dpt_avg_salary
order by  dpt_avg_salary

The new query plan shows a Batched Key Access is used to join Department and Employee tables, you can BAK information from MySQL manual for details, the new plan takes only 1.99 seconds and it is more than 3 times better than the original SQL syntax.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

How to Tune SQL Statement with CASE Expression for SQL Server I?

sql performance monitoring

Here the following is a simple SQL statement with a CASE expression syntax.

SELECT *
FROM EMPLOYEE
WHERE
CASE
when  emp_id   < 1001000 then ‘Old Employee’
when  emp_dept <‘B’   then ‘Old Department’
ELSE‘Normal’
END = ‘old Employee’

Here the following are the query plans of this SQL, it takes 2.23 seconds in a cold cache situation, which means data will be cached during the SQL is executing. The query shows a Full Table Scan of the EMPLOYEE table due to the CASE expression cannot utilize the emp_id index or emp_dept index.

We can rewrite the CASE expression into the following syntax with multiple OR conditions.

select *
from  EMPLOYEE
where  emp_id < 1005000
     and ‘Old Employee’ = ‘Old Employee’
     or not  ( emp_id < 1005000 )
       and emp_dept < ‘B’
       and‘Old Department’ = ‘Old Employee’
     or not  ( emp_id < 1005000 )
       and not ( emp_dept < ‘B’ )
       and‘Normal’ = ‘Old Employee’

Here is the query plan of the rewritten SQL and the speed is 0.086 seconds. It is 25 times better than the original syntax. The new query plan shows an Index Seek of EMP_ID index.

This SQL rewrite is useful when the CASE expression is equal to a hardcoded literal, but if the literal “  =’Old Employee’ ” replaced by a variable “ = :var ”, this rewrite may not be useful, I will discuss it in my next blog.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for SQL Server automatically.

Tosska SQL Tuning Expert (TSES™) for SQL Server® – Tosska Technologies Limited

How to Tune SQL Statement with OR conditions in a Subquery for SQL Server?

sql performance monitoring

The following is an example that shows a SQL statement with an EXISTS subquery. The SQL counts the records from the EMPLOYEE table if the OR conditions are satisfied in the subquery of the DEPARTMENT table.

select countn(*) from employee a where
exists (select ‘x’ from department b
    where a.emp_id=b.dpt_manager or a.emp_salary=b.dpt_avg_salary
     )

Here the following is the query plan in the Tosska proprietary tree format, it takes 4 minutes and 29 seconds to finish.

The query plan shows a Nested Loops from EMPLOYEE to full table scan DEPARTMENT, it is the main problem of the entire query plan, the reason is the SQL Server cannot resolve this OR conditions  ”a.emp_id=b.dpt_manager or a.emp_salary=b.dpt_avg_salary” by other join operations.

Let me rewrite the OR conditions in the subquery into a UNION ALL subquery in the following, the first part of the UNION ALL in the subquery represents the “a.emp_id=b.dpt_manager” condition, the second part represents the “a.emp_salary=b.dpt_avg_salary” condition but exclude the data that already satisfied with the first condition.

select  count(*)
from   employee a
where  exists ( select  ‘x’
        from   department b
        where  a.emp_id = b.dpt_manager
        union all
        select  ‘x’
        from   department b
        where  ( not ( a.emp_id = b.dpt_manager )
            or b.dpt_manager is null )
            and a.emp_salary = b.dpt_avg_salary )

Here the following is the query plan of the rewritten SQL, it looks a little bit complex, but the performance is very good now, it takes only 0.447 seconds. There are two Hash Match joins that are used to replace the original Nested Loops from EMPLOYEE to full table scan DEPARTMENT.

Although the steps to the final rewrite is a little bit complicated, this kind of rewrites can be achieved by Tosska SQL Tuning Expert for SQL Server automatically, it shows that the rewrite is more than 600 times fastAlthough the steps to the final rewrite is a little bit complicated, this kind of rewrites can be achieved by Tosska SQL Tuning Expert for SQL Server automatically, it shows that the rewrite is more than 600 times faster than the original SQL.

Tosska SQL Tuning Expert (TSES™) for SQL Server® – Tosska Technologies Limited

How to Tune SQL Statements to Run SLOWER… but Make Users Feel BETTER (Oracle)?

MySQL database and SQL

Your end-users may keep on complaining about some functions of their database application are running slow, but you may found that those SQL statements are already reached their maximum speed in the current Oracle and hardware configuration. There may be no way to improve the SQL unless you are willing to upgrade your hardware. To make your users feel better, sometimes, you don’t have to tune your SQL to run faster but to tune your SQL to run slower for certain application’s SQL statements.

This is an example SQL that is used to display the information from tables Emp_sal_hist and Employee if they are satisfied with certain criteria. This SQL is executed as an online query and users have to wait for at least 5 seconds before any data will be shown on screen after the mouse click.

select * from employee a,emp_sal_hist c
where a.emp_name like ‘A%’
     and a.emp_id=c.sal_emp_id
     and c.sal_salary<1800000
order by c.sal_emp_id

Here the following is the query plan and execution statistics of the SQL, it takes 10.41 seconds to extract all 79374 records and the first records return time ”Response Time” is 5.72 seconds. The query shows a MERGE JOIN of EMPLOYEE and EMP_SAL_HIST table, there are two sorting operations of the corresponding tables before it is being merged into the final result. It is the reason that users have to wait at least 5 seconds before they can see anything shows on the screen.

As the condition “a.emp_id = c.sal_emp_id”, we know that “ORDER BY c.sal_emp_id“ is the same as “ORDER BY a.emp_id“,  as SQL syntax rewrite cannot force a specified operation in the query plan for this SQL, I added an optimizer hint /*+ INDEX(@SEL$1 A EMPLOYEE_PK) */ to reduce the sorting time of order by a.emp_id.

SELECT  /*+ INDEX(@SEL$1 A EMPLOYEE_PK) */ *
FROM    employee a,
      emp_sal_hist c
WHERE a.emp_name LIKE ‘A%’
    AND a.emp_id=c.sal_emp_id
    AND c.sal_salary<1800000
ORDER BY c.sal_emp_id

Although the overall Elapsed Time is 3 seconds higher in the new query plan, the response time is now reduced from 5.72 seconds to 1.16 seconds, so the users can see the first page of information on the screen more promptly and I believe most users don’t care whether there are 3 more seconds for all 79374 records to be returned. That is why SQL tuning is an art rather than science when you are going to manage your users’ expectations.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for Oracle automatically.

https://tosska.com/tosska-sql-tuning-expert-pro-tse-pro-for-oracle/

How to Tune SQL Statement with “< ANY (subquery)” Operator for Oracle?

database query optimization

Here the following is a simple SQL statement with a “< ANY (Subquery)” syntax.

SELECT  *
FROM    employee
WHERE  emp_salary< ANY (SELECT emp_salary
              FROM  emp_subsidiary
              where  emp_dept=‘AAA’
              )

Here the following is the query plan of the SQL, it takes 18.49 seconds to finish. The query shows a “TABLE ACCESS FULL” of EMPLOYEE table and “MERGE JOIN SEMI” to a VIEW that is composed of a HASH JOIN of two indexes “INDEX RANGE SCAN” of EMP_SUBSIDIARY.

You can see that it is not an efficient query plan if we know that the emp_salary of EMP_SUBSIDIARY is a not null column, we can rewrite the SQL into the following syntax. The Nvl(Max(emp_salary),-99E124)is going to handle the case that if the subquery returns no record, the -99E124 representing the minimum number that the emp_salary can store to force an unconditional true for the subquery comparison.

SELECT  *
FROM    employee
WHERE  emp_salary < (SELECT  Nvl(Max(emp_salary),-99E124)
            FROM   emp_subsidiary
            WHERE  emp_dept = ‘AAA’)

Here is the query plan of the rewritten SQL and the speed is 0.01 seconds which is 1800 times better than the original syntax. The new query plan shows an “INDEX RANGE SCAN” instead of “TABLE ACCESS FULL” of EMPLOYEE.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for Oracle automatically, there are other rewrites with similar performance, but it is not suitable to discuss in this short article, maybe I can discuss later in my blog.

https://tosska.com/tosska-sql-tuning-expert-pro-tse-pro-for-oracle/

How to Tune SQL Statements to Run SLOWER… but Make Users Feel BETTER (MySQL)?

MySQL database and SQL

Your end-users may keep on complaining about some functions of their database application are running slow, but you may found that those SQL statements are already reached their maximum speed in the current MySQL and hardware configuration. There may be no way to improve the SQL unless you are willing to upgrade your hardware. To make your users feel better, sometimes, you don’t have to tune your SQL to run faster but to tune your SQL to run slower for certain application’s SQL statements.

This is an example SQL that is used to display the information from tables Emp_subsidiary and Employee if they are satisfied with certain criteria. This SQL is executed as an online query and users have to wait for at least 5 seconds before any data will be shown on screen after the mouse click.

select  *
from    employee a,
         emp_subsidiary b
where   a.emp_id = b.emp_id
         and a.emp_grade < 1050
         and b.emp_salary < 5000000
order by a.emp_id

Here the following is the query plan and execution statistics of the SQL, it takes 5.48seconds to extract all 3645 records and the first records return time ”Response Time(Duration)” is 5.39 seconds. The query shows a “Full Table Scan b (emp_subsidiary)” to Nested-Loop “a (employee)” table, an ORDER operation is followed by sorting the returned data by emp_id. You can see there is a Sort Cost=7861.86 at the ORDER step on the query plan. It is the reason that users have to wait at least 5 seconds before they can see anything shows on the screen.

To reduce the sorting time of a.emp_id, since a.emp_id=b.emp_id, so I can rewrite the order by clause from “order by a.emp_id” to “order by b.emp_id”, MySQL now can eliminate the sorting time by using the EMPLOYEE_PK after the nested loop operation.

select  *
from    employee a,
         emp_subsidiary b
where   a.emp_id = b.emp_id
         and a.emp_grade < 1050
         and b.emp_salary < 5000000
order by b.emp_id

Although the overall Elapsed Time is higher in the new query plan, you can see that the response time is reduced from 5.397 seconds to 0.068, so the users can see the first page of information on the screen instantly and they don’t care whether there are 2 more seconds for all 3,645 records to be returned. That is why SQL tuning is an art rather than science when you are going to manage your users’ expectations.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

Optimization in SQL: Answering 4 Commonly-Asked Questions

optimization of sql queries

A SQL query or statement is tasked with fetching the required information from the database. While the same output can be gained from different statements, they are likely to work at different performance levels.

The difference in performance output makes a lot of difference because a millisecond of lapse in query execution can result in huge losses for the organization. This makes it extremely necessary to ensure the best statement is being used, which is where optimization in SQL is considered.

#1: What is Query Optimization in Databases?

Query optimization in databases is the general process of picking out the most efficient way of obtaining data from the database i.e. carrying out the best query for a given requirement. Since SQL is nonprocedural, it can be processed, merged, and reorganized as seen fit by the optimizer and the database.

The database enhances each query on the basis of various statistics gathered about the information fetched from it. On the other hand, the optimizer selects the optimal plan for a query after assessing different access techniques including index and full-table scans. Various join methods and orders are also used along with certain probable transformations.

#2: What is Query Cost in Optimization?

Query cost is a metric that helps examine execution plans and determine the optimal one. Depending on the SQL statement and the environment, the optimizer sets an estimated numerical cost for every step throughout potential plans and considers an aggregate to derive the overall cost estimate for it.

The total query cost of a query is the sum of the costs incurred at every step in it. Since query cost is a comparative estimate of the resources needed to carry out every step of an execution plan, it doesn’t have any unit. The optimizer picks out the plan with the least cost projection once it has completed all its calculations of all the available plans.

#3: Is Query Cost the Best Way to Judge Performance?

In a word: No. Why? Although query cost proves useful in comprehending the manner in which a specific query is optimized, we must bear in mind its main goal: helping the optimizer select decent execution plans.

It does not offer a direct measure of parameters such as CPU, IO, memory, duration that are significant to users waiting for a statement to finish running. In other words, a low query cost won’t necessarily mean the plan is optimal or the query in question is the quickest. Similarly, a high query cost can prove more efficient in comparison, which is why it is not recommended to depend too much on query cost when considering performance.

Being a CPU-intensive operation query optimization in SQL takes a lot of resources to determine the best plan among the ones present. Time also needs to be factored in here as the user may not always have the time it may take for this entire process to take place. 

Therefore, the resources required to optimize a statement, those required to run the statement, and the time it takes for all of this to be done with shouldn’t exceed each other. 

#4: How Can We Optimize a SQL Query?

Query optimization often needs extra resources, such as the addition of indexes. However, we can boost query performance by simply rewriting a statement to decrease resource consumption without further expenses.

This lets us save significant resources, money, and time (if a query optimization tool is used). Through query optimization in SQL, we can focus on specific areas that are causing latency instead of examining the entire procedure. In such cases, looking for sections that are taking up more resources will help us narrow down the search and fix issues more quickly.

How to Tune Bad Performance SET ROWCOUNT SQL Statements for SQL Server?

sql performance monitoring

Some SQL statements will be running very slow after SET ROWCOUNT or TOP is used.  SET ROWCOUNT and TOP are used to tell SQL Server to select a specific number of rows from the SQL statements instead of extracting all records. Not many people know that SQL Server will try to re-optimize your SQL statements after you adding SET ROWCOUNT or TOP, the result is normally good after re-optimization of your SQL statements that can generate query plans for retrieving the first few records as fast as possible.

Good Example for Query Re-optimization for SET ROWCOUNT

Here the following is an example that shows the SQL takes 6.78 seconds to retrieve 217500 rows from the database, the query plan shows a good plan with a Hash Match for two Table Scan of [DEPARTMENT] and [EMPLOYEE].

The following screen shows the new query plan is generated after the SET ROWCOUNT 1 is used, the query plan is changed from Hash Match to Nested Loops. Nested Loops operation normally provides faster first few records retrieval time but may not be good for overall records extraction in certain situations. It is good to see that SQL Server uses only 0.013 seconds to extract the first row for this SQL.

Bad Example for Query Re-optimization for SET ROWCOUNT

Let’s see a bad example that shows how SQL Server degrades a good query plan to a bad query plan after the SET ROWCOUNT 1 is used. Here the following is an example that shows the SQL takes 0.118 seconds to retrieve 1613 rows from the database, the query plan is a little bit complex but it is a good query plan to retrieve all 1613 rows.

The following screen shows the new query plan is generated after the SET ROWCOUNT 1 is used, the query plan is now changed to Nested Loops with two Table Scans. The new query plan takes 1.312 seconds to extract only the first record, it is even slower than the 0.118 seconds that is used to extract all 1613 rows from the database.

How to Solve This Problem?

We can use Hints injection or SQL syntax rewrite to influence SQL Server to get back the original plan or generate an even better query plan for the SET ROWCOUNT or TOP operation. The following Hints injection generated a good query plan that is almost 90 times better than the original SQL with SET ROWCOUNT 1.

Tosska SQL Tuning Expert (TSES™) for SQL Server® – Tosska Technologies Limited

How to Tune Cold Cache SQL Statements for SQL Server?

sql optimizer for sql server

For SQL statements that are not executed frequently, so that the relevant data is no longer exists in the buffer cache, a cold cache will significantly affect the performance of a SQL statement. A good performance SQL for hot cache may not be performing well in a cold cache environment. Experience developers will tune their SQL running well for both environments.

Here the following is an example SQL:

select * from
EMPLOYEE A
 where  A.EMP_ID IN (SELECT B.EMP_ID from EMP_SUBSIDIARY B
                      where B.EMP_DEPT < ‘D’)

Here the following is the query plan in the Tosska proprietary tree format, it takes 8.024 seconds for the first execution with cache delay and it takes 3.7 seconds for the second execution without caching time.

According to the query plan, you may find that the most significant IO consumption is the Table Scan of [EMPLOYEE] table. To simulate the cold cache environment, we can use the DBCC DROPCLEANBUFFERS command to clear the data cache before each execution of rewritten or optimized SQL statement.

Let me add an optimizer hint OPTION(LOOP JOIN) to the SQL and try to change the query plan from a Hash Match to a Nested Loop join. So, the EMP_ID(EMPLOYEE_PK) and a RID Lookup to [EMPLOYEE] will be used instead of using Table Scan. I hope that the RID Lookup can select fewer data from hard disk with matched EMP_ID in both [EMPLOYEE] and [EMP_SUBSIDIARY].

select *
from  EMPLOYEE A
where A.EMP_ID in (select B.EMP_ID
          from   EMP_SUBSIDIARY B
          where   B.EMP_DEPT < ‘D’) OPTION(LOOP JOIN)

Here the following is the query plan, the time is reduced from 8.024 seconds to 1.565 seconds with data cache overhead, and the physical reads are also dropped from 190,621 to 39,044. It shows a wrong IO estimation If you just rely on the SQL Server’s EstimateIO x EstimiateExecutions in the query plan.

There are other even better tuning solutions for this SQL with the A.I. SQL tuning tool in the following:

Tosska SQL Tuning Expert (TSES™) for SQL Server® – Tosska Technologies Limited

The following SQL with an optimizer hint generate a more complicated query plan with the best execution time of 0.7 seconds. The SQL is tuned by cold cache simulation that data will be flushed before each execution of SQL alternatives.

Optimization in SQL: Determining Stale Statistics in Oracle

Optimization in SQL

You can determine whether statistics are stale in Oracle using two methods. The first is to make Oracle tell you if it considers the stats are stale.

The second involves a comparison of the statistics of what the DBMS assumes a table to be – and what it really is. Here, we’ll help you understand and determine whether the stats are stale.

Other Reasons Why Oracle May Pick a Bad Plan (Aside from Stale Statistics)

Good statistics aren’t necessarily always perfect – they only have to be correct to a degree for the information in the table. In case your statements are running sluggishly because Oracle picked a bad plan, you may also take it as a fair sign that your statistics are stale.

That said, stale statistics aren’t the only culprits behind Oracle selecting a bad plan, triggering the need for optimization in SQL. There are others, such as the following:

  • If your data doesn’t include enough histograms, extended statistics, or joins (both correlated and anti-correlated ones), it may not be uniform enough for Oracle to perceive it as it should.
  • Oracle might find it hard to calculate the amount of data to expect. This generally happens when a query is written in a way that confuses the DBMS.
  • When Oracle lacks a precise representation of the duration involving the completion of one or more operations. The system statistics may be incorrect in such cases.
  • The data set may not be sufficiently represented in Oracle’s version of the statistics. This usually happens in large tables with highly varied data that a histogram cannot accurately represent.
  • Certain indexes typically used by Oracle may have become invalid.

Coaxing Oracle to Determine Stale Statistics

You can have Oracle tell you about stale stats easily if your priority is to improve performance of SQL query.  The catch is, you won’t be able to determine how stale those stats are. However, you will know if there have been enough changes in a table for Oracle to consider regathering statistics on it.

To find out if that’s the case, you’ll need to view the stale stats column in DBA_STATISTICS which you can do with the following query:

select stale_stats

from dba_statistics

where owner = ‘<name of table owner>’’

AND table_name = ‘<name of table>’

The column may return “YES”, “NO” or other results, indicating Oracle’s stance on the stats. “YES” means Oracle is ready to re-gather statistics, whereas “NO” shows Oracle believes the statistics don’t need updating.

The column may return null, indicating incomplete or absent stats altogether. Take care to enter the correct table owner and table names as the query won’t return any rows otherwise.

Checking Stats on Your Own: What to Do in Oracle Database and SQL

When you do this manually, you can find out “how stale” your stats are. That’s because you’ll be able to collect stats on the “stalest” tables first, reducing the number of changes needed to be made to the database. This way, you could also avoid accumulating stats in situations that could lead to contention.

The goal here is to draw a comparison between Oracle’s values and the table’s actual values. Typically, a difference of up to 10 per cent between the two is acceptable. Also, this method requires us to check two separate kinds of statistics –

  • Table Level Statistics – As the name suggests, you can verify various aspects of the table like the following:
  1. A number of rows and empty blocks – You can use a query like this:

DBA_TAB_STATISTICS.NUM_ROWS

select count(*)

from <table name>;

  1. Number of Empty blocks and
  2. A number of blocks taken up by the table – The statement below will help you retrieve the number of “occupied blocks”:

DBA_TAB_STATISTICS.BLOCKS –

DBA_TAB_STATISTICS.EMPTY_BLOCKS

select count(distinct substr(rowid, 7, 9))

from <table name>;

  • Column Level Statistics – These will help you determine things like:
  1. How many distinct values exist in a column – This proves useful for expressions that include “COLUMN = <any value>”.

Try the following:

select count(distinct <columnname>)

from <table name>;

  1. The column’s high and low values – These values prove useful for situations that need range-based predicates, such as those involving COLUMN <= <any value> or COLUMN between <START> and <END>.
  1. The number of nulls in a column – This query should help if you want to view the stats of a particular column: 

with cte (x)

as

(

   select /*+ inline */ <column name>

   from <table owner name>.<table name that you want to check>

)

select (select approx_count_distinct(x) from cte) distinct_values

     , (select count(*) – count(x) from cte) num_nulls

     , (select min(x) from cte) low_value

     , (select max(x) from cte) high_value

from dual