How to Tune SQL Statement with EXISTS Subquery for SQL Server II ?

Optimization in SQL

In my last article that a SQL statement with an Exists subquery was improved 90 times by the following rewrite.

SELECT *
FROM DEPARTMENT
where exists (select ‘x’
         from employee
         where emp_id > 2700000
         and emp_dept=DPT_ID)

Query Plan:

Rewritten SQL syntax:

select  *
from DEPARTMENT
where  DPT_ID in (select    isnull(emp_dept,emp_dept)
         from      employee
         where   emp_id > 2700000
         group by emp_dept)

Query Plan:

Syntax Rewrite Solution
Syntax rewrite technique to improve SQL statements are commonly used by DBA or developers especially for Oracle or MySQL databases, but syntax rewrite is not easy to be applied by users who are using MS SQL Server or IBM Db2 LUW. The reason is that MS SQL Server and IBM Db2 LUW have a strong internal rewrite engine in their SQL optimizer. The internal SQL rewrite engine will try to rewrite a SQL syntax to their internal canonical syntax. It means that no matter how you rewrite your SQL statement, MS SQL Server and IBM Db2 LUW will try to rewrite the SQL back to their internal presumed good syntax, so it is difficult to tune a SQL if the so-called presumed good syntax is not good, since users are not easy to influence database SQL optimizer to generate a better query plan by simple SQL syntax rewrite.

Query Hints Injection Solution
To solve this problem, SQL Server provides Query Hints feature for users to help its SQL optimizer generate a better query plan. It is not like the SQL syntax rewrite method, experienced developers may tell what the final query plan will be for a rewritten syntax, Query Hints is a pinpoint solution that a query hint injection is normally applied to the specific step of the entire query plan, but a change to a plan step will incur domino effect to other plan steps in the entire query plan since MS SQL Server must adjust other plan steps to achieve what the user’s expectation for the query hint in the SQL statement. So, the final query plan is not easy to predict by users, especially for complex SQL statements.

The following SQL with Hints injection generated by Tosska SQL Tuning Expert is around 4 times better than the original SQL and takes 0.639 seconds.

select   *
from  DEPARTMENT
where exists ( select  ‘x’
        from  employee
        where emp_id > 2700000
           and emp_dept = DPT_ID) OPTION(LOOP JOIN,HASH GROUP)

There is an even better SQL with Hints injected, it is around 50 times better than original SQL and takes 0.055 seconds. This query plan is pretty close to the rewrite tuning in my last article.

select   *
from  DEPARTMENT
where exists ( select  ‘x’
        from  employee WITH(INDEX(EMPS_DPT_INX))
        where emp_id > 2700000
           and emp_dept = DPT_ID)

Syntax Rewrite plus Hints Injection Solution
For some SQL statements, a separate syntax rewrite method or a hints injection method may not be able to solve a complex SQL performance problem individually, some people may think that will it be possible if we rewrite a SQL and apply hints at the same time to improve a SQL statement? Yes, it is possible in the Tosska SQL Tuning Expert A.I. engine, this technology can solve more SQL performance problems by a computer algorithm ever before. I will discuss this technology later in my blog.

Tosska SQL Tuning Expert (TSES™) for SQL Server® – Tosska Technologies Limited

The following screen show Tosska SQL Tuning Expert can generate 178 distinguished query plans after investigated 300 SQL Hints injection, it is far out of what a human expert can achieve within 10 minutes. MS SQL Server is the most sensitive to Query Hints Injection database in the market, SQL Server query hints are normally able to influence SQL optimizer to generate a specific query plan, so the SQL tuning for MS SQL Server is far more challenging than other databases.

How to Tune SQL Statement with EXISTS Subquery for SQL Server I ?

sql server tuning tools

The following is an example that shows a SQL statement with an Exists subquery. The SQL retrieves records from the DEPARTMENT table that DPT_ID is found in emp_dept of employee table with emp_id > 2700000.

SELECT *
FROM DEPARTMENT
where exists (select ‘x’
         from employee
         where emp_id > 2700000
         and emp_dept=DPT_ID)

Here the following is the query plan in the Tosska proprietary tree format, it takes 2.23 seconds to finish.

The query plan shows two Hash Match from [EMPLOYEE].[EMPLOYEE_PK] to [EMPLOYEE].[EMPS_DPT_INX] and then Merge Join to a sorted [DEPARTMENT] table. This query plan looks reasonable, but the number of records scan from [EMPLOYEE] is too expensive at the first stage, can we use the small [DEPARTMENT] table to scan back the [EMPLOYEE] table to improve the SQL.

Let me rewrite the EXISTS subquery into an IN subquery in the following, but the query plan is not changed as expected.

select  *
from DEPARTMENT
where  DPT_ID in (select   emp_dept
         from     employee
         where  emp_id > 2700000)

I further rewrite the SQL and add the dummy function “isnull(emp_dept,emp_dept)” in the select list, but it cannot stop the operation of Hash Match to [EMPLOYEE].[EMPS_DPT_INX].

select  *
from DEPARTMENT
where  DPT_ID in (select    isnull(emp_dept,emp_dept)
         from      employee
         where   emp_id > 2700000)

To further enforce the restriction for stoping the operation “Hash Match to [EMPLOYEE].[EMPS_DPT_INX]”, I try to add a dummy “group by emp_dept” operation in the subquery.

select  *
from DEPARTMENT
where  DPT_ID in (select    isnull(emp_dept,emp_dept)
         from      employee
         where   emp_id > 2700000
         group by emp_dept)

Here the following is the query plan after the final rewrite, SQL server first uses a Table Scan of [DEPARTMENT] table and Nested Loop of “EMPS_DPT_INX index seek to RID Lookup of [EMPLOYEE]” with the Top 1 operation, so each record from [DEPARTMENT] table will match at most one record from [EMPLOYEE] only. The speed now is 0.024 seconds and is much faster than the original SQL.

Although the steps to the final rewrite is a little bit complicated, this kind of rewrites can be achieved by Tosska SQL Tuning Expert for SQL Server automatically, it shows that the rewrite is more than 90 times faster than the original SQL.  

Tosska SQL Tuning Expert (TSES™) for SQL Server® – Tosska Technologies Limited