Optimization in SQL: Determining Stale Statistics in Oracle

Optimization in SQL

You can determine whether statistics are stale in Oracle using two methods. The first is to make Oracle tell you if it considers the stats are stale.

The second involves a comparison of the statistics of what the DBMS assumes a table to be – and what it really is. Here, we’ll help you understand and determine whether the stats are stale.

Other Reasons Why Oracle May Pick a Bad Plan (Aside from Stale Statistics)

Good statistics aren’t necessarily always perfect – they only have to be correct to a degree for the information in the table. In case your statements are running sluggishly because Oracle picked a bad plan, you may also take it as a fair sign that your statistics are stale.

That said, stale statistics aren’t the only culprits behind Oracle selecting a bad plan, triggering the need for optimization in SQL. There are others, such as the following:

  • If your data doesn’t include enough histograms, extended statistics, or joins (both correlated and anti-correlated ones), it may not be uniform enough for Oracle to perceive it as it should.
  • Oracle might find it hard to calculate the amount of data to expect. This generally happens when a query is written in a way that confuses the DBMS.
  • When Oracle lacks a precise representation of the duration involving the completion of one or more operations. The system statistics may be incorrect in such cases.
  • The data set may not be sufficiently represented in Oracle’s version of the statistics. This usually happens in large tables with highly varied data that a histogram cannot accurately represent.
  • Certain indexes typically used by Oracle may have become invalid.

Coaxing Oracle to Determine Stale Statistics

You can have Oracle tell you about stale stats easily if your priority is to improve performance of SQL query.  The catch is, you won’t be able to determine how stale those stats are. However, you will know if there have been enough changes in a table for Oracle to consider regathering statistics on it.

To find out if that’s the case, you’ll need to view the stale stats column in DBA_STATISTICS which you can do with the following query:

select stale_stats

from dba_statistics

where owner = ‘<name of table owner>’’

AND table_name = ‘<name of table>’

The column may return “YES”, “NO” or other results, indicating Oracle’s stance on the stats. “YES” means Oracle is ready to re-gather statistics, whereas “NO” shows Oracle believes the statistics don’t need updating.

The column may return null, indicating incomplete or absent stats altogether. Take care to enter the correct table owner and table names as the query won’t return any rows otherwise.

Checking Stats on Your Own: What to Do in Oracle Database and SQL

When you do this manually, you can find out “how stale” your stats are. That’s because you’ll be able to collect stats on the “stalest” tables first, reducing the number of changes needed to be made to the database. This way, you could also avoid accumulating stats in situations that could lead to contention.

The goal here is to draw a comparison between Oracle’s values and the table’s actual values. Typically, a difference of up to 10 per cent between the two is acceptable. Also, this method requires us to check two separate kinds of statistics –

  • Table Level Statistics – As the name suggests, you can verify various aspects of the table like the following:
  1. A number of rows and empty blocks – You can use a query like this:

DBA_TAB_STATISTICS.NUM_ROWS

select count(*)

from <table name>;

  1. Number of Empty blocks and
  2. A number of blocks taken up by the table – The statement below will help you retrieve the number of “occupied blocks”:

DBA_TAB_STATISTICS.BLOCKS –

DBA_TAB_STATISTICS.EMPTY_BLOCKS

select count(distinct substr(rowid, 7, 9))

from <table name>;

  • Column Level Statistics – These will help you determine things like:
  1. How many distinct values exist in a column – This proves useful for expressions that include “COLUMN = <any value>”.

Try the following:

select count(distinct <columnname>)

from <table name>;

  1. The column’s high and low values – These values prove useful for situations that need range-based predicates, such as those involving COLUMN <= <any value> or COLUMN between <START> and <END>.
  1. The number of nulls in a column – This query should help if you want to view the stats of a particular column: 

with cte (x)

as

(

   select /*+ inline */ <column name>

   from <table owner name>.<table name that you want to check>

)

select (select approx_count_distinct(x) from cte) distinct_values

     , (select count(*) – count(x) from cte) num_nulls

     , (select min(x) from cte) low_value

     , (select max(x) from cte) high_value

from dual

How to Tune SQL Statement with EXISTS Subquery for SQL Server I ?

sql server tuning tools

The following is an example that shows a SQL statement with an Exists subquery. The SQL retrieves records from the DEPARTMENT table that DPT_ID is found in emp_dept of employee table with emp_id > 2700000.

SELECT *
FROM DEPARTMENT
where exists (select ‘x’
         from employee
         where emp_id > 2700000
         and emp_dept=DPT_ID)

Here the following is the query plan in the Tosska proprietary tree format, it takes 2.23 seconds to finish.

The query plan shows two Hash Match from [EMPLOYEE].[EMPLOYEE_PK] to [EMPLOYEE].[EMPS_DPT_INX] and then Merge Join to a sorted [DEPARTMENT] table. This query plan looks reasonable, but the number of records scan from [EMPLOYEE] is too expensive at the first stage, can we use the small [DEPARTMENT] table to scan back the [EMPLOYEE] table to improve the SQL.

Let me rewrite the EXISTS subquery into an IN subquery in the following, but the query plan is not changed as expected.

select  *
from DEPARTMENT
where  DPT_ID in (select   emp_dept
         from     employee
         where  emp_id > 2700000)

I further rewrite the SQL and add the dummy function “isnull(emp_dept,emp_dept)” in the select list, but it cannot stop the operation of Hash Match to [EMPLOYEE].[EMPS_DPT_INX].

select  *
from DEPARTMENT
where  DPT_ID in (select    isnull(emp_dept,emp_dept)
         from      employee
         where   emp_id > 2700000)

To further enforce the restriction for stoping the operation “Hash Match to [EMPLOYEE].[EMPS_DPT_INX]”, I try to add a dummy “group by emp_dept” operation in the subquery.

select  *
from DEPARTMENT
where  DPT_ID in (select    isnull(emp_dept,emp_dept)
         from      employee
         where   emp_id > 2700000
         group by emp_dept)

Here the following is the query plan after the final rewrite, SQL server first uses a Table Scan of [DEPARTMENT] table and Nested Loop of “EMPS_DPT_INX index seek to RID Lookup of [EMPLOYEE]” with the Top 1 operation, so each record from [DEPARTMENT] table will match at most one record from [EMPLOYEE] only. The speed now is 0.024 seconds and is much faster than the original SQL.

Although the steps to the final rewrite is a little bit complicated, this kind of rewrites can be achieved by Tosska SQL Tuning Expert for SQL Server automatically, it shows that the rewrite is more than 90 times faster than the original SQL.  

Tosska SQL Tuning Expert (TSES™) for SQL Server® – Tosska Technologies Limited

How to Tune SQL Statements with CONCAT Operator for MySQL?

oracle sql performance tuning

There may be some business requirements that need to compare concatenate strings and column with a given unknown length of the bind variable. Here is an example SQL that retrieves data from EMPLOYEE and DEPARTMENT tables where employee’s department ID must concatenate two strings before it is compared to an unknown length of variable @dpt_var

select * from employee,department
where concat(concat(‘A’,emp_dept),‘B’) = @dpt_var
and  emp_dept= dpt_id

Here the following are the query plans of this SQL, it takes 23.8 seconds to finish. The query shows a “Full Table Scan Employee” to nested loop Department table.

You can see that this SQL cannot utilize index scan even the emp_dept is an indexed field. Let me add a “force index(EMPS_DPT_INX) hints to the SQL and hope it can help MySQL SQL optimizer to use index scan, but it fails to enable the index scan anyway, so I add one more dummy condition emp_dept >= ” , it is an always true condition that emp_dept should be greater or equal to a smallest empty character. It is to fool MySQL SQL optimizer that emp_dept’s index is a reasonable step.

select  *
from  employee force index(EMPS_DPT_INX),
     department
where  concat(concat(‘A’,emp_dept),‘B’) = @dpt_var
     and emp_dept >= ”
     and emp_dept = dpt_id

Here is the query plan of the rewritten SQL and it is running faster. The new query plan shows that an Index Range Scan is used for Employee table first and then nested loop Department table.

This kind of rewrite can be achieved by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is more than 3 times faster than the original SQL.

https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

SQL Performance Tuning: Frequent Questions about Indexes

SQL performance tuning

A database is a piece of software operating on a computer, which means it is dependent and likely to face the same limitations as other software present on that computer. In other words, it will only be able to process as much data as the hardware can handle.

One of the best ways to speed up queries is to perform SQL performance tuning. In this post, we will answer some of the most frequent questions involving databases and indexes.

What is Indexing in SQL Query Optimization?

Indexing is one of the first things you may have come across while learning the ropes of your database. It is a wonderful tool that enables users to enhance the efficiency of their database. However, bear in mind that not every database requires indexing, and not all indexes are helpful in SQL performance tuning.

Let’s learn more about indexing: what it is and how it helps in enhancing database performance.

How do Indexes Affect SQL Query Performance?

An Index can locate data swiftly without having to go through each row in the table. This saves plenty of time! 

Certain data columns are required before you can create an index. These are –

  • The Search Key which holds a duplicate of the primary key
  • The Data Reference which has a set of pointers

All of these constitute the structure of one index. To understand how an index works, let us take an example. Suppose you need to look for a bit of data in your database. Rather than scour every line yourself, you make the computer search each row till it locates the information. Remember that the search is bound to take much longer if the requisite information is located at the end. Fortunately, you have the option to sort alphabetically to shorten the length of such queries.

What are the Types of Database Indexes?

Database indexes are of two kinds –

Clustered indexes – These arrange data using the primary key. The reason behind using a clustered index is to make sure the primary key is saved in ascending order. This is the same order in which the table stores memory.

A clustered index is automatically created when the primary key is set, which helps in SQL tuning for Oracle in the long run as well.

Non-clustered indexes – A non-clustered index is a data structure that boosts data fetching speed. It is different from clustered indexes, as they are made by data analysts or developers.

When and How Should We Use Indexes?

Since indexes are intended to accelerate database performance, you should apply them whenever you think they can simplify the use of the database. Although smaller databases may not have several opportunities to use indexes, they are likely to see the benefits of indexing as they grow into larger databases. 

You can make sure your indexes keep performing well, if you test run a set of queries on your database first. Clock the time those queries take to execute and begin creating your indexes after that. Keep rerunning these ‘tests’ for continuous improvements.

Conclusion

Indexing has its challenges, the biggest one being determining the best ones for every table.

For instance, heaps require clustered indexes because searching for a record in a heap table is comparable to finding a needle in a haystack: it’s inefficient and time-consuming, thanks to the heap’s unordered structure.

On the other hand, locating data is simpler and faster from a table that contains a proper clustered index, just like finding a name in a list that’s alphabetically ordered. DBAs, therefore, recommend that every SQL table contains a proper clustered index. Now that you know how indexes work and how they can optimize database performance, you should be able to use them to reduce query times substantially. If you would like more tips on how to use indexing, or you need a SQL query optimization tool for your database, let our experts know!