For some SQL statements with multiple Max() functions in the select list and nothing in the Where clause, we have different methods to create new indexes to improve the SQL speed.
Here is an example SQL, it is to retrieve the maximum name and age from the employee table.
select max(emp_name),
max(emp_age)
from employee
The following is the query plan that takes 9.27 seconds.
The SQL cannot be tuned by SQL syntax rewrite or hints injection, and the SSMS cannot recommend any index to improve the SQL.
For this kind of SQL that we can consider building a composite index or two individual indexes for emp_name and emp_age. A new composite of these two columns (emp_age, emp_name) can improve the SQL around 7 times. The following is the query plan shows that the new composite index is used, but it has to scan the entire index for these two stream aggregate operations before getting the max(emp_name) and max(emp_age).
How about if we build two individual indexes for emp_name and emp_age. The following is the result and query plan of these two indexes created. A Top operator selects the first row from each index and returns to the Stream Aggregate operation, and then a Nested Loops join the two maximum results together. It is 356 times much faster than the original SQL.
You may suffer from SQL statements with a slow first execution time due to the long data cache process. The following SQL is simple that retrieves records from the EMPLOYEE table that if EMP_SALARY < 500000 and the result set is ordered by EMP_NAME.
Select emp_id,
emp_name,
emp_salary,
emp_address,
emp_telephone
from employee
where emp_salary < 500000
order by emp_name;
The following is the query plan that takes 9.51 seconds for the first execution and takes 0.99 seconds for the second execution without data cache.
The SQL cannot be tuned by SQL syntax rewrite or hints injection for both the first execution and the second execution, it is because SQL Server has selected the best query plan for this simple SQL statement. But the problem is that if the condition “where emp_salary < 500000” is changed; say from 500000 to 510000 or the EMPLOYEE data is flushed out from the memory, the execution time will then be prolonged up to 9.51 seconds.
Let’s see if we can build indexes to improve this situation. There is a common perception that a good index can help to improve both the first execution time and the second execution time. So, I use a tool to explore a lot of indexes configurations, but none of them can improve both executions’ performance. Here the following is the performance of the second execution with data cached for different indexes proposed by the tool. You can see the performance of “Index Set 1” is close to the original SQL performance with a little performance variation due to the system’s loading status and all other indexes sets are worse than the original SQL. Normally, we will give up the tuning of the SQL statement without even trying to see whether those recommended indexes are good for the first execution time.
I did a test for those recommended indexes to see whether they are helpful to improve the first execution time, it surprises me that the “Index Set 1” is tested with a significant improvement and improves the first execution time from 9.51 seconds to 0.65 seconds. It is a 14 times improvement that can make my database run more efficiently. So, you should be very careful to tune your SQL with new indexes that may not be good for your second execution with all data cached, but it may be very good for your first execution without data cached.
This kind of indexes recommendation can be achieved by Tosska SQL Tuning Expert Pro for SQL Server automatically.
For some SQL statements with multiple Max() functions in the select list and nothing in the Where clause, we have different methods to create new indexes to improve the SQL speed.
Here is an example SQL, it is to retrieve the maximum name and age from the employee table.
select max(emp_name),
max(emp_age)
from employee
The following is the query plan that takes 9.27 seconds.
The SQL cannot be tuned by SQL syntax rewrite or hints injection, and the SSMS cannot recommend any index to improve the SQL.
For this kind of SQL that we can consider building a composite index or two individual indexes for emp_name and emp_age. A new composite of these two columns (emp_age, emp_name) can improve the SQL around 7 times. The following is the query plan shows that the new composite index is used, but it has to scan the entire index for these two stream aggregate operations before getting the max(emp_name) and max(emp_age).
How about if we build two individual indexes for emp_name and emp_age. The following is the result and query plan of these two indexes created. A Top operator selects the first row from each index and returns to the Stream Aggregate operation, and then a Nested Loops join the two maximum results together. It is 356 times much faster than the original SQL.
This kind of indexes recommendation can be achieved by Tosska SQL Tuning Expert Pro for SQL Server automatically.
For some SQL statements that are failed to be tuned by syntax rewrite, hints injection, and all necessary indexes are built, people may think that hardware upgrade is the only way to resolve the performance problem. But, please don’t undermine your SQL Server’s SQL optimizer which can provide you with the ultimate performance solution that you may not have imagined before. What you need to do is to provide SQL Server with a set of proper new indexes.
Here is an example SQL, it is to retrieve the minimum employee’s salary and the emp_id that with salary greater than all salary of the emp_subsidiary table with subsidiary’s employees’ department = “AAA”.
SELECT emp_id,
(SELECT min(emp_salary)
FROM employee)
FROM employee
WHERE emp_salary > (SELECT max(emp_salary)
FROM emp_subsidiary
where emp_dept = ‘AAA’)
Although all columns that show in the SQL are indexed, the following query plan takes 44 seconds.
The SQL cannot be tuned by SQL syntax rewrite or hints injection, and the SSMS can recommend only one index on one table for a SQL statement, it is failed to recommend any good index. So, the SQL cannot be tuned in any traditional way.
Let’s use our new A.I. index recommendation engine to see if there are any good index solutions. A set of indexes is recommended listed in the following. It takes only 0.55 seconds.
Example: 80 times faster A.I. SQL index recommendation
The query plan shows that two new indexes are used at the same time that the SSMS is not able to provide.
Tosska SQL Tuning Expert Pro is in-built with an A.I. engine to recommend indexes for multiple tables at the same time for a SQL statement. The new technology is so powerful to recommend multiple tables’ new indexes for a SQL at the same time, it means that how each new table’s indexes affect each other in the query plan will be considered by the engine. It is very helpful for SQL Server’s SQL optimizer to explore more potential query plans that could not be generated before. So, don’t undermine your SQL Server’s ability. Instead, use the right tool to tune your SQL statements before you are planning to upgrade your hardware.
SELECT *
FROM EMPLOYEE
WHERE
CASE
when emp_id < 1001000 then ‘Old Employee’
when emp_dept <‘B’ then ‘Old Department’
ELSE ‘Normal’
END = ‘old Employee’
If I change the hardcoded literal to a @var, what will be the performance of the last blog’s rewritten SQL?
SELECT *
FROM EMPLOYEE
WHERE
CASE
when emp_id < 1001000 then ‘Old Employee’
when emp_dept <‘B’ then ‘Old Department’
ELSE ‘Normal’
END = @var
I use the same method in my last blog to rewrite this SQL into the following multiple OR syntax, but the SQL Server optimizer change back to a full table scan of the EMPLOYEE table. It is because the SQL Server cannot do a good cardinality estimation of the variable of @var.
select *
from EMPLOYEE
where emp_id < 1005000
and ‘Old Employee’ = @var
or not ( emp_id < 1005000 )
and emp_dept < ‘B’
and ‘Old Department’ = @var
or not ( emp_id < 1005000 )
and not ( emp_dept < ‘B’ )
and ‘Normal’ = @var
We can rewrite the CASE expression into the following syntax with multiple UNION ALL statements, this syntax is more complicated than the rewrite with multiple OR conditions in my last blog. But it can make SQL Server improve the query plan to be more efficient.
select *
from EMPLOYEE
where emp_id < (select 1005000)
and ‘Old Employee’ = @var
union all
select *
from EMPLOYEE
where ( not ( emp_id < 1005000 )
and ‘Old Employee’ = @var
or @var is null )
and emp_id >= 1005000
and emp_dept < ‘B’
and ‘Old Department’ = @var
union all
select *
from EMPLOYEE
where ( not ( emp_id < 1005000 )
and ‘Old Employee’ = @var
or @var is null )
and ( not ( emp_id >= 1005000
and emp_dept < ‘B’
and ‘Old Department’ = @var )
or @var is null )
and emp_id >= 1005000
and emp_dept >= ‘B’
and ‘Normal’ = @var
Here is the query plan of the rewritten SQL and the speed is 0.448 seconds. It is 5 times better than the original syntax. People may think that there are two table scan operations of EMPLOYEE that will slow down the whole process, but actually, the corresponding filter operations will stop the table scan operations immediately due to the filter conditions ‘Normal’ = @var and ‘Old Department’ = @var will not be satisfied. This kind of query plan cannot be generated by SQL Server’s internal SQL optimizer, it means that you cannot use Hints injection to get this query plan.
This kind of rewrite can be achieved by Tosska SQL Tuning Expert for SQL Server automatically.
Here the following is a simple SQL statement with a CASE expression syntax.
SELECT * FROM EMPLOYEE WHERE CASE when emp_id < 1001000 then‘Old Employee’ when emp_dept <‘B’then‘Old Department’ ELSE‘Normal’ END = ‘old Employee’
Here the following are the query plans of this SQL, it takes 2.23 seconds in a cold cache situation, which means data will be cached during the SQL is executing. The query shows a Full Table Scan of the EMPLOYEE table due to the CASE expression cannot utilize the emp_id index or emp_dept index.
We can rewrite the CASE expression into the following syntax with multiple OR conditions.
select * from EMPLOYEE where emp_id < 1005000 and‘Old Employee’ = ‘Old Employee’ or not ( emp_id < 1005000 ) and emp_dept < ‘B’ and‘Old Department’ = ‘Old Employee’ or not ( emp_id < 1005000 ) and not ( emp_dept < ‘B’ ) and‘Normal’ = ‘Old Employee’
Here is the query plan of the rewritten SQL and the speed is 0.086 seconds. It is 25 times better than the original syntax. The new query plan shows an Index Seek of EMP_ID index.
This SQL rewrite is useful when the CASE expression is equal to a hardcoded literal, but if the literal “ =’Old Employee’ ” replaced by a variable “ = :var ”, this rewrite may not be useful, I will discuss it in my next blog.
This kind of rewrite can be achieved by Tosska SQL Tuning Expert for SQL Server automatically.
Here the following is a simple SQL statement with a CASE expression syntax.
SELECT * FROM EMPLOYEE WHERE CASE when emp_id < 1001000 then‘Old Employee’ when emp_dept <‘B’then‘Old Department’ ELSE‘Normal’ END = ‘old Employee’
Here the following are the query plans of this SQL, it takes 2.23 seconds in a cold cache situation, which means data will be cached during the SQL is executing. The query shows a Full Table Scan of the EMPLOYEE table due to the CASE expression cannot utilize the emp_id index or emp_dept index.
We can rewrite the CASE expression into the following syntax with multiple OR conditions.
select * from EMPLOYEE where emp_id < 1005000 and‘Old Employee’ = ‘Old Employee’ or not ( emp_id < 1005000 ) and emp_dept < ‘B’ and‘Old Department’ = ‘Old Employee’ or not ( emp_id < 1005000 ) and not ( emp_dept < ‘B’ ) and‘Normal’ = ‘Old Employee’
Here is the query plan of the rewritten SQL and the speed is 0.086 seconds. It is 25 times better than the original syntax. The new query plan shows an Index Seek of EMP_ID index.
This SQL rewrite is useful when the CASE expression is equal to a hardcoded literal, but if the literal “ =’Old Employee’ ” replaced by a variable “ = :var ”, this rewrite may not be useful, I will discuss it in my next blog.
This kind of rewrite can be achieved by Tosska SQL Tuning Expert for SQL Server automatically.
Some SQL statements will be running very slow after SET ROWCOUNT or TOP is used. SET ROWCOUNT and TOP are used to tell SQL Server to select a specific number of rows from the SQL statements instead of extracting all records. Not many people know that SQL Server will try to re-optimize your SQL statements after you adding SET ROWCOUNT or TOP, the result is normally good after re-optimization of your SQL statements that can generate query plans for retrieving the first few records as fast as possible.
Good Example for Query Re-optimization for SET ROWCOUNT
Here the following is an example that shows the SQL takes 6.78 seconds to retrieve 217500 rows from the database, the query plan shows a good plan with a Hash Match for two Table Scan of [DEPARTMENT] and [EMPLOYEE].
The following screen shows the new query plan is generated after the SET ROWCOUNT 1 is used, the query plan is changed from Hash Match to Nested Loops. Nested Loops operation normally provides faster first few records retrieval time but may not be good for overall records extraction in certain situations. It is good to see that SQL Server uses only 0.013 seconds to extract the first row for this SQL.
Bad Example for Query Re-optimization for SET ROWCOUNT
Let’s see a bad example that shows how SQL Server degrades a good query plan to a bad query plan after the SET ROWCOUNT 1 is used. Here the following is an example that shows the SQL takes 0.118 seconds to retrieve 1613 rows from the database, the query plan is a little bit complex but it is a good query plan to retrieve all 1613 rows.
The following screen shows the new query plan is generated after the SET ROWCOUNT 1 is used, the query plan is now changed to Nested Loops with two Table Scans. The new query plan takes 1.312 seconds to extract only the first record, it is even slower than the 0.118 seconds that is used to extract all 1613 rows from the database.
How to Solve This Problem?
We can use Hints injection or SQL syntax rewrite to influence SQL Server to get back the original plan or generate an even better query plan for the SET ROWCOUNT or TOP operation. The following Hints injection generated a good query plan that is almost 90 times better than the original SQL with SET ROWCOUNT 1.
For SQL statements that are not executed frequently, so that the relevant data is no longer exists in the buffer cache, a cold cache will significantly affect the performance of a SQL statement. A good performance SQL for hot cache may not be performing well in a cold cache environment. Experience developers will tune their SQL running well for both environments.
Here the following is an example SQL:
select * from
EMPLOYEE A
where A.EMP_ID IN (SELECT B.EMP_ID from EMP_SUBSIDIARY B
where B.EMP_DEPT < ‘D’)
Here the following is the query plan in the Tosska proprietary tree format, it takes 8.024 seconds for the first execution with cache delay and it takes 3.7 seconds for the second execution without caching time.
According to the query plan, you may find that the most significant IO consumption is the Table Scan of [EMPLOYEE] table. To simulate the cold cache environment, we can use the DBCC DROPCLEANBUFFERS command to clear the data cache before each execution of rewritten or optimized SQL statement.
Let me add an optimizer hint OPTION(LOOP JOIN) to the SQL and try to change the query plan from a Hash Match to a Nested Loop join. So, the EMP_ID(EMPLOYEE_PK) and a RID Lookup to [EMPLOYEE] will be used instead of using Table Scan. I hope that the RID Lookup can select fewer data from hard disk with matched EMP_ID in both [EMPLOYEE] and [EMP_SUBSIDIARY].
select *
from EMPLOYEE A
where A.EMP_ID in (select B.EMP_ID
from EMP_SUBSIDIARY B
where B.EMP_DEPT < ‘D’) OPTION(LOOP JOIN)
Here the following is the query plan, the time is reduced from 8.024 seconds to 1.565 seconds with data cache overhead, and the physical reads are also dropped from 190,621 to 39,044. It shows a wrong IO estimation If you just rely on the SQL Server’s EstimateIO x EstimiateExecutions in the query plan.
There are other even better tuning solutions for this SQL with the A.I. SQL tuning tool in the following:
The following SQL with an optimizer hint generate a more complicated query plan with the best execution time of 0.7 seconds. The SQL is tuned by cold cache simulation that data will be flushed before each execution of SQL alternatives.
In my last article that a SQL statement with an Exists subquery was improved 90 times by the following rewrite.
SELECT *
FROM DEPARTMENT
where exists (select ‘x’
from employee
where emp_id > 2700000
and emp_dept=DPT_ID)
Query Plan:
Rewritten SQL syntax:
select *
from DEPARTMENT
where DPT_ID in (select isnull(emp_dept,emp_dept)
from employee
where emp_id > 2700000
group by emp_dept)
Query Plan:
Syntax Rewrite Solution
Syntax rewrite technique to improve SQL statements are commonly used by DBA or developers especially for Oracle or MySQL databases, but syntax rewrite is not easy to be applied by users who are using MS SQL Server or IBM Db2 LUW. The reason is that MS SQL Server and IBM Db2 LUW have a strong internal rewrite engine in their SQL optimizer. The internal SQL rewrite engine will try to rewrite a SQL syntax to their internal canonical syntax. It means that no matter how you rewrite your SQL statement, MS SQL Server and IBM Db2 LUW will try to rewrite the SQL back to their internal presumed good syntax, so it is difficult to tune a SQL if the so-called presumed good syntax is not good, since users are not easy to influence database SQL optimizer to generate a better query plan by simple SQL syntax rewrite.
Query Hints Injection Solution
To solve this problem, SQL Server provides Query Hints feature for users to help its SQL optimizer generate a better query plan. It is not like the SQL syntax rewrite method, experienced developers may tell what the final query plan will be for a rewritten syntax, Query Hints is a pinpoint solution that a query hint injection is normally applied to the specific step of the entire query plan, but a change to a plan step will incur domino effect to other plan steps in the entire query plan since MS SQL Server must adjust other plan steps to achieve what the user’s expectation for the query hint in the SQL statement. So, the final query plan is not easy to predict by users, especially for complex SQL statements.
The following SQL with Hints injection generated by Tosska SQL Tuning Expert is around 4 times better than the original SQL and takes 0.639 seconds.
select *
from DEPARTMENT
where exists ( select ‘x’
from employee
where emp_id > 2700000
and emp_dept = DPT_ID) OPTION(LOOP JOIN,HASH GROUP)
There is an even better SQL with Hints injected, it is around 50 times better than original SQL and takes 0.055 seconds. This query plan is pretty close to the rewrite tuning in my last article.
select *
from DEPARTMENT
where exists ( select ‘x’
from employee WITH(INDEX(EMPS_DPT_INX))
where emp_id > 2700000
and emp_dept = DPT_ID)
Syntax Rewrite plus Hints Injection Solution
For some SQL statements, a separate syntax rewrite method or a hints injection method may not be able to solve a complex SQL performance problem individually, some people may think that will it be possible if we rewrite a SQL and apply hints at the same time to improve a SQL statement? Yes, it is possible in the Tosska SQL Tuning Expert A.I. engine, this technology can solve more SQL performance problems by a computer algorithm ever before. I will discuss this technology later in my blog.
The following screen show Tosska SQL Tuning Expert can generate 178 distinguished query plans after investigated 300 SQL Hints injection, it is far out of what a human expert can achieve within 10 minutes. MS SQL Server is the most sensitive to Query Hints Injection database in the market, SQL Server query hints are normally able to influence SQL optimizer to generate a specific query plan, so the SQL tuning for MS SQL Server is far more challenging than other databases.
New Enterprise product Tosska DB Ace Enterprise for Oracle (DBAO™) targeted for DBAs are now available. Download now.