In my last article that a SQL statement with an Exists subquery was improved 90 times by the following rewrite.
SELECT *
FROM DEPARTMENT
where exists (select ‘x’
from employee
where emp_id > 2700000
and emp_dept=DPT_ID)
Query Plan:
Rewritten SQL syntax:
select *
from DEPARTMENT
where DPT_ID in (select isnull(emp_dept,emp_dept)
from employee
where emp_id > 2700000
group by emp_dept)
Query Plan:
Syntax Rewrite Solution
Syntax rewrite technique to improve SQL statements are commonly used by DBA or developers especially for Oracle or MySQL databases, but syntax rewrite is not easy to be applied by users who are using MS SQL Server or IBM Db2 LUW. The reason is that MS SQL Server and IBM Db2 LUW have a strong internal rewrite engine in their SQL optimizer. The internal SQL rewrite engine will try to rewrite a SQL syntax to their internal canonical syntax. It means that no matter how you rewrite your SQL statement, MS SQL Server and IBM Db2 LUW will try to rewrite the SQL back to their internal presumed good syntax, so it is difficult to tune a SQL if the so-called presumed good syntax is not good, since users are not easy to influence database SQL optimizer to generate a better query plan by simple SQL syntax rewrite.
Query Hints Injection Solution
To solve this problem, SQL Server provides Query Hints feature for users to help its SQL optimizer generate a better query plan. It is not like the SQL syntax rewrite method, experienced developers may tell what the final query plan will be for a rewritten syntax, Query Hints is a pinpoint solution that a query hint injection is normally applied to the specific step of the entire query plan, but a change to a plan step will incur domino effect to other plan steps in the entire query plan since MS SQL Server must adjust other plan steps to achieve what the user’s expectation for the query hint in the SQL statement. So, the final query plan is not easy to predict by users, especially for complex SQL statements.
The following SQL with Hints injection generated by Tosska SQL Tuning Expert is around 4 times better than the original SQL and takes 0.639 seconds.
select *
from DEPARTMENT
where exists ( select ‘x’
from employee
where emp_id > 2700000
and emp_dept = DPT_ID) OPTION(LOOP JOIN,HASH GROUP)
There is an even better SQL with Hints injected, it is around 50 times better than original SQL and takes 0.055 seconds. This query plan is pretty close to the rewrite tuning in my last article.
select *
from DEPARTMENT
where exists ( select ‘x’
from employee WITH(INDEX(EMPS_DPT_INX))
where emp_id > 2700000
and emp_dept = DPT_ID)
Syntax Rewrite plus Hints Injection Solution
For some SQL statements, a separate syntax rewrite method or a hints injection method may not be able to solve a complex SQL performance problem individually, some people may think that will it be possible if we rewrite a SQL and apply hints at the same time to improve a SQL statement? Yes, it is possible in the Tosska SQL Tuning Expert A.I. engine, this technology can solve more SQL performance problems by a computer algorithm ever before. I will discuss this technology later in my blog.
The following screen show Tosska SQL Tuning Expert can generate 178 distinguished query plans after investigated 300 SQL Hints injection, it is far out of what a human expert can achieve within 10 minutes. MS SQL Server is the most sensitive to Query Hints Injection database in the market, SQL Server query hints are normally able to influence SQL optimizer to generate a specific query plan, so the SQL tuning for MS SQL Server is far more challenging than other databases.
There may be some business requirements that need to compare concatenate strings and column with a given unknown length of the bind variable.
Here is an example SQL that retrieves data from EMPLOYEE and DEPARTMENT tables where employee’s department ID must concatenate two strings before it is compared to an unknown length of variable @dpt_var
select * from employee,department
where concat(concat(‘A’,emp_dept),‘B’) = @dpt_var
and emp_dept= dpt_id
Here the following are the query plans of this SQL, it takes 23.8 seconds to finish. The query shows a “Full Table Scan Employee” to nested loop Department table.
You can see that this SQL cannot utilize index scan even the emp_dept is an indexed field. Let me add a “force index(EMPS_DPT_INX) hints to the SQL and hope it can help MySQL SQL optimizer to use index scan, but it fails to enable the index scan anyway, so I add one more dummy condition emp_dept >= ” , it is an always true condition that emp_dept should be greater or equal to a smallest empty character. It is to fool MySQL SQL optimizer that emp_dept’s index is a reasonable step.
select *
from employee force index(EMPS_DPT_INX),
department
where concat(concat(‘A’,emp_dept),‘B’) = @dpt_var
and emp_dept >= ”
and emp_dept = dpt_id
Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows that an Index Range Scan is used for Employee table first and then nested loop Department table.
This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3 times faster than the original SQL.
The following is an example shows a SQL statement with two conditions “emp_dept=dpt_id and emp_dept<‘L’”
select * from employee,department
where emp_dept=dpt_id
and emp_dept<‘L’
and emp_id<1500000
and emp_salary= dpt_avg_salary
order by dpt_avg_salary
Here the following is the query plan of this SQL in Tosska proprietary tree format, it takes 8.84 seconds to finish.
The query plan looks reasonable that shows a full table scan of DEPARTMENT to nested-loop EMPLOYEE table, the records in EMPLOYEE table being nested-loop must satisfy with the condition “emp_id<1500000” and the corresponding index EMPS_SALARY_INX is also used. Due to the number of records in the first driving table in a Nested Loop Join is very critical to the join performance, we should find a way to narrow down the number of result records of DEPARTMENT table before it is used to nested-loop EMPLOYEE table.
As the conditions “emp_dept=dpt_id and emp_dept<‘L’”, it implies that “dpt_id < ‘L’” is also true, let me add this extra condition to the SQL, it helps MySQL SQL optimizer to make a better decision with more information provided by the new SQL syntax, this technique is especially useful for MySQL database. Remark: Oracle or MS SQL Server are doing very good on their internal Transitive Dependency Improvement in their SQL optimizer already, so this technique may not work for Oracle and MS SQL Server.
select *
from employee,
department
where emp_dept = dpt_id
and dpt_id < ‘L’
and emp_dept < ‘L’
and emp_id < 1500000
and emp_salary = dpt_avg_salary
order by dpt_avg_salary
Let’s see the DEPARTMENT is now being filtered by the new condition “dpt_id < ‘L’ “ with an index range scan. You can see the estimated Rows 401 of DEPARTMENT table is now being trimmed down to 176. The rewritten SQL now takes only 3.8 seconds with such a simple change in syntax.
This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that this rewrite is more than 2 times faster than the original SQL with such an easy change in the syntax. https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/
The following is an example shows a SQL statement with two union operator in a subquery. The SQL retrieve records from EMPLOYEE table that EMP_ID should satisfy with the union result set from two queries in a subquery.
select * from employee
where emp_id IN
(select emp_id from emp_subsidiary where emp_grade=1000
union
select emp_id from employee where emp_dept=‘AAA’)
Here the following are the query plans in Tosska proprietary tree format, it takes 3 minutes 27 seconds to finish.
The query plan shows a full table scan of EMPLOYEE table and the attached subquery will be executed for each of scanned record. So, you can see the query plan is very inefficient. If we know the union result set is small and it should be executed first, and then use EMP_ID index to retrieve EMPLOYEE table. Let me rewrite the Union subquery as a derived table expression in the following:
select *
from employee
where emp_id in (select emp_id
from (select emp_id
from emp_subsidiary
where emp_grade = 1000
union
select emp_id
from employee
where emp_dept = ‘AAA’) DT1)
Now, you can see the Union subquery is executed first and use it to retrieve the EMPLOYEE table by EMP_ID index. The overall query is now become more reasonable and efficient.
This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 60 times faster than the original SQL. There are some other rewrites with even better performance, but it is a little bit complicated to discuss in this short article, let’s discuss it in my coming blogs.
It is common that the performance is not good if a SQL statement with OR conditions. Let’s have an example show you how to tune those SQL statements in certain situations.
Here is an example SQL that extract records from EMPLOYEE table if (emp_grade < 1050 or emp_id<730000). Emp_grade and emp_id are indexed and they are not null field.
select * from employee
where emp_grade < 1050 or emp_id<730000
You can see MySQL SQL Optimizer use an Index Merge of emps_grade_inx and employee_pk to process the SQL, the performance is not good as expected since the result set is quite big for sort_union operation. It takes more than 40 seconds to finish the data retrieval. Let me rewrite the OR condition into the following UNION ALL statement, please make sure the emp_grade and emp_id are not null column, otherwise it may generate error result. The rewrite is simple that the first part extract data with emp_grade<1050, the second part of the UNION ALL retrieve records that satisfied with emp_id<730000, but it is not retrieved in the first part of the UNION ALL.
select *
from employee
where emp_grade < 1050
union all
select *
from employee
where not ( emp_grade < 1050 )
and emp_id < 730000
Here the following is the query plan of this SQL, it takes 12.46 seconds to finish. The query shows two “Index Range Scan” of EMPLOYEE_PK and EMPS_GRADE_INX to the employee table.
This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3 times faster than the original SQL. There are some other rewrites shown in this screen with comparable results too.
There may be some business requirements that need to compare certain part of a column as a data retrieval criteria.
Here is an example SQL that retrieves data from EMPLOYEE table employee’s name with a string pattern “Acco” start from 5 character of the emp_name.
select *
from employee
where substr(emp_name,5,4)=‘Acco’
Here the following are the query plans of this SQL, it takes 17 seconds to finish. The query shows a “Full Table Scan Employee”
You can see that this SQL cannot utilize index scan even the emp_name is indexed field. Let me add a “Force Index(emp_name_inx)“ hints to the SQL and hope it can help MySQL SQL optimizer to use index scan, but it fails to enable the index scan anyway, so I add one more dummy condition emp_name >= ‘ ‘ , it is an always true condition that emp_name should be greater or equal to a smallest empty character.
select *
from employee force index(emp_name_inx)
where substr(emp_name,5,4) = ‘Acco’
and emp_name >= ‘ ‘
Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows that an Index Range Scan is used now.
This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 6 times faster than the original SQL.
There may be some business requirements that needs to retrieve only some rows from a table (or join tables) randomly. This kind of SQL is normally hard to tune.
For example the following SQL retrieve one row from two tables join of Employee and Department, where Employee’s department code is ‘AAA’, and both Emp_dept and Dpt_id columns are indexed.
select *
from employee ,department
where emp_dept=dpt_id
and emp_dept = ‘AAA’
order by rand()
limit 1
Here the following is the query plan of this SQL, it takes 3.126 seconds to finish. The query shows a nested loop from Department table to Employee table to extract all records with ‘AAA’ department code. An Order operation is executed followed from the join result.
You can see the most expensive step is to extract all Employee data with department code “AAA”, if there is an unique key such Employee ID (EMP_ID) which can uniquely identify a row from the query, you can use the With common table expressions in MySQL version 8 to randomly select rows from the KEY column only with the same conditions given by the original query, for example the following blue colored SQL text in “With” clause, it randomly select 1 row of EMP_ID from the join query. It not only significantly reduces the size of the data retrieved from Employee table, but it also shrank the size of Order By operation. And then the main query will use the selected EMP_ID to extract specific row from original query, so the whole query will run faster with this new rewrite syntax.
with DT1
as (select EMP_ID
from employee,
department
where emp_dept = dpt_id
and emp_dept = ‘AAA’
order by rand() limit 1)
select *
from employee,
department
where emp_dept = dpt_id
and emp_dept = ‘AAA’
and EMP_ID in (select EMP_ID
from DT1)
order by rand() limit 1
/* Remark: “order by rand() limit 1” is used to make sure that only 1 row will be selected if the EMP_ID cannot uniquely identify only one row */
Here is the query plan of the rewritten SQL with less cost and run much faster.
This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 100 times faster than the original SQL.
The LIKE is a logical operator that determines if a character string matches a specified pattern. A pattern may include regular characters and wildcard characters. The LIKE operator is used in the WHERE clause of the SELECT, UPDATE, and DELETE statements to filter rows based on pattern matching.
Here is an example SQL that retrieves data from EMPLOYEE table employee’s name with a string pattern like “Mary%”. If the emp_name is indexed, the following SQL will utilize Index Range Scan of the emp_name and the speed of the SQL will be fine.
select *
from employee
where emp_name like ‘Mary%’;
If user is looking for emp_name with pattern like ‘%Mary%’, MySQL SQL Optimizer cannot user emp_name index to speed up the process, full table scan is normally be used and the performance will be bad too.
select *
from employee
where emp_name like ‘Mary%’;
Here the following are the query plan of this SQL, it takes 18.8 seconds to finish. The query shows a “Full Table Scan” of employee table.
You can see that this SQL cannot utilize index scan even the emp_name is indexed. Let me add a “Force Index(emp_name_inx)“ hints to the SQL and hope it can help MySQL SQL optimizer to use index scan, but it fails to enable the index scan anyway, so I add one more dummy condition emp_name >= ” , it is an always true condition that emp_name should always greater or equal to a smallest empty character.
select *
from employee force index(emp_name_inx)
where emp_name like ‘%Mary%’
and emp_name >= ‘ ‘;
Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows that an Index Range Scan is used now.
This kind of rewrites can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 5 times faster than the original SQL.
Application package software is a collection of software programs that is developed for the purpose of being licensed to third-party organizations. Although a package software may be tailored for a user’s specific needs through parameters or tables, the software itself is not developed specifically for an organization. So, users do not own the source code and have no way to modify the embedded SQL statements for performance tuning purpose. There are a lot of application packages built on Oracle RDBMS such as Siebel, PeopleSoft JD Edwards, SAP and so on. In order to help application packages’ users, Oracle provides some features for helping users to tune their SQL statements without the need to change their source code.
SQL profile
It is a profile generated by Oracle SQL Tuning Advisor. A SQL profile contains corrections for wrongly estimated statistics, auxiliary information. Therefore, SQL profile just guides the optimizer to a better plan, but they do not guarantee the same plan each time the statement is parsed. For certain SQL statements, no matter how good the statistics are corrected, Oracle SQL optimizer is still not able to generate a better plan in specific environments. For these kinds of SQL statements, human intervention is necessary, but SQL profile is not a convenient tool for developers to force Oracle to pick up a new plan without changing the program source code.
SQL plan baselines and stored outlines
Due to the Oracle environment changes or Oracle database version upgrade, it might target Oracle SQL optimizer to generate new plans for certain SQL statements. If it is not good, and we need something to preserve the old plans for the new environment. To achieve SQL plan stability, stored outlines was the major tool in earlier releases of Oracle Database. This feature is still supported in Oracle Database 11g; however, it might be depreciated in the future releases and replaced by SQL plan management. The mechanism of SQL Plan Baselines is to preserve the performance of specified SQL statements, regardless of changes in the database environment or release upgrade. Furthermore, create Plan Baselines manually for a SQL statement is possible, and this technique can help developers to guide Oracle SQL optimizer to generate a specific plan for a bad performance SQL statement. So, when Oracle SQL optimizer receive the same SQL statement next time, a better performance plan will be composed according to the new plan baselines stored in database. There is no need to change the SQL syntax in source programs.
For example, if you want to tune a SQL with execution plan-A that is currently used by Oracle SQL optimizer in your database, and you want to tune the SQL with Hints to make Oracle SQL optimizer to generate a new execution plan-B. What you have to do is to execute the tuned SQL with new Hints and use the following method provided by Oracle:
Execute the tuned SQL with Hints and plan B cached in SGA.
To enable the use of the tuned plan, manually alter the tuned plan to a fixed plan by setting its FIXED attribute to YES.
To enable the use of SQL plan baselines, make sure the OPTIMIZER_USE_SQL_PLAN_BASELINES initialization parameter is set to TRUE.
Weaknesses of using SQL Plan baselines for SQL tuning
As the SQL plan baselines was designed to preserve the performance of SQL statements such as after the following environment changes:
New optimizer version
Changes to optimizer statistics and optimizer parameters
Changes to schema and metadata definitions
Changes to system settings
SQL profile creating
You can see that it is not designed for the purpose of manual SQL tuning. There are also some additional limitations such as Parallel Hints is not supported by SQL Plan Baselines, you cannot load a Plan-B with Parallel Hints applied your SQL with bad performance of original Plan-A. Parallel Hints sometimes are very important for a better plan generation by Oracle SQL optimizer.
SQL Patches
SQL Patches is part of the features provided by SQL Repair Advisor which is used to fix a SQL statement’s critical failures such as returning wrong result. The SQL Repair Advisor analyzes the problematic statement and in many cases recommends a SQL patch to repair the statement. The SQL patch is to influence the Oracle SQL optimizer to choose an alternate execution plan for future executions, instead of using the original problematic execution plan. There is a public API call to create SQL patches manually provided by Oracle Database 12c Release 2 onwards. The DBMS_SQLDIAG.CREATE_SQL_PATCH package can help users to create a SQL Patch for specific SQL statement for SQL tuning purpose. You can change a bad performance SQL statement’s execution plan without the need to modify the program source code as the following example:
DECLARE
Patch_name VARCHAR2(32767);
BEGIN
Patch_name := SYS.DBMS_SQLDIAG.create_sql_patch(
sql_text => 'SELECT *
FROM employees
WHERE emps_dept IN
(SELECT dpts_id
FROM departments
WHERE dpts_avg_salary <200000)', hint_text => 'INDEX(@SEL$1 EMPLOYEES) INDEX(@SEL$2 DEPARTMENTS)',
name => 'my_sql_patch_name');
END;
If your database version is before Oracle database 12c Release 2, you must use this package DBMS_SQLDIAG_INTERNAL.i_create_patch instead. Both SQL text and SQL ID is able to be used for SQL hints injection. The injected hints for your SQL should be placed in hint_text input parameter. There is only one line of Hints text you can use for a SQL and there is no way to define your own query block name for any subqueries’ block. So, if your SQL has multiple subqueries and you want to instruct Oracle to do something in subqueries’ blocks, you must use Oracle default query block names in your injected hints text.
hint_text => ‘ INDEX(@SEL$1 EMPLOYEES) INDEX(@SEL$2 DEPARTMENTS) ‘
This hints text in the above example shows that @SEL$1 and @SEL$2 are default query block names provided by Oracle in the execution plan of the SQL. The Hints tells Oracle use index search for EMPLOYEES table in query block @SEL$1 and also use index search for DEPARTMENTS in query block @SEL$2.
Pros and cons of using SQL Patches to tune SQL
SQL Patches is more flexible to accept hints instructions without SQL Plan Baselines’ limitations, complex hints with parallel operations are normally accepted by SQL patches. There is no additional maintenance effort to tell Oracle to use the SQL Patches after it is created. Oracle will use the stored hints to optimize any SQL with the same SQL ID or SQL Text and generate a better performance execution plan. Furthermore, you can also use SQL Patches to disable a SQL with a destructive hints already written in a package application or even use it to control a bind-aware SQL execution behavior.
As the injected hints text must be placed in one text line and using default query block name only, manually compose a desire Hints to improve a SQL statement will be a difficult task for most SQL developers especially for complex SQL statements with many subqueries.
A tool to automatically create Hints and SQL Patches
There is only one tool in the market so far that is able to generate a better hints and create SQL Patch in a fully automatic way. Tosska SQL Tuning Expert Pro is a tool for users to improve SQL performance without touching their program source code. Users can deploy different performance query plans for various sizes of production databases without the effort of keeping multiple versions of the program source, and it is especially suitable for package application users who don’t own the source code of their applications. The tool will try most useful hints combinations to tune your bad performance SQL statement, the best Hints combination SQL performance will be benchmarking side by side with the original SQL. Users will get the exact performance improvement without any guesswork or uncertain cost assessment only.
A lot of people are talking about why or why not use Oracle In-memory database in their applications and most of them are too focused on the size of the database or whether it is an OLAP application. It seems that small and medium size databases are not suitable for using Oracle In-memory database option. But if your OLTP databases are suffering from performance bottleneck and you are looking for solutions, I think Oracle In-Memory database option should be on your solutions list, especially when you are planning to upgrade your hardware.
The SQL optimizer plan space is getting bigger in Oracle In-Memory database
Plan space is the size of potential process methods that database SQL optimizer will consider before processing your input SQL statements. A bigger plan space size means that database SQL optimizer will consider more potential methods to process your SQL statements. So, your SQL statements have more chances to run faster. With Oracle’s In-Memory database new In-Memory data access methods, Oracle SQL optimizer will consider a bigger plan space before the execution of your SQL statement. Due to the multiplier effect, a new In-Memory execution plan step might result in tremendous huge expansion of plan space size. For example, a SQL statement with 5 tables in In-Memory, a new “Table Access Inmemory Full” step to each table’s access method, the result is C(5,1)+C(5,2)+C(5,3)+C(5,4)+C(5,5)=5+10+10+5+1=31. But don’t forget this multiplier is applicable to every single execution plan in original plan space tree, it means the new plan space size might be potentially 31 times bigger than the original plan space.
How OLTP database get advantage from In-Memory database?
An OLTP database is a transaction-oriented application which requires quick response time for each transaction, but it doesn’t mean that there is no complex SQL for online report or data consolidation during business hours. If these kinds of slow SQL statements are running simultaneously with other online transactions, the overall performance will be affected. If hardware upgrade is one of the options, you should consider using Oracle In-Memory as one of the alternative solutions.
Here is an example that shows you how an OLTP SQL statement gets benefit from using Oracle In-Memory database option.
The following is a typical OLTP SQL with all tables analyzed, the original execution plan shows an Oracle adaptive plan that Hash Join or Nested Loop will be decided during the initial stage of execution. The elapsed time of this SQL is around 1 minute and 35 seconds.
Let’s put EMPLOYEE table into Oracle In-Memory with a force hint and Oracle use it with “Table Access Inmemory Full” scan, the following benchmark shows that the “Auto 1” SQL took only 0.45 second to finish with more than 200 times faster than the original execution plan. The solution indicates that the introduction of a new “Table Access Inmemory Full” scan to the EMPLOYEE table has actually created a much better execution plan compared with the original one.
There is no need to put all tables into In-Memory for OLTP’s SQL
When using an OLTP database, we don’t want to introduce too much overhead to the online transactions by populating all tables into In-Memory. So, the goal of In-Memory SQL tuning for OLTP SQL is not to select the best performance solution, but the most cost-effective solution with fewer tables In-Memory tables but still acceptable performance improvement instead.
The following “Auto 3” solution with all tables are put into In-Memory, but the improvement is just 0.01 second better than the “Auto 1” which only requires one table to be put into In-Memory. Therefore, it is obvious that “Auto 1” is a much more preferable choice.
Oracle should charge their In-Memory option by In-Memory size
Actually, Oracle In-Memory database is not only beneficial to OLAP users, but it is also a good SQL performance- enhancement tool for all database users like OLTP users. In view of high pricing of Oracle In-Memory database option which limits the rapid adaptation of this new technology, it is recommended that Oracle should decide the price according to the In-Memory size. In other words, a lower size with lower price will definitely help Oracle
In-Memory database to penetrate widely into all databases users, including OLTP users.
Author: Richard To (Richard.to@tosska.com), CTO of Tosska Technologies Limited