Tuesday, December 7, 2010

Difference Between In and Exists - Part 2


SQL> select * from smalltable1 where data1 in (select data5 from largetable);

100 rows selected.
 
SQL> select * from smalltable1 st where exists (select 1 from largetable lt where st.data1=lt.data5);

100 rows selected.
SQL> select * from largetable lt where data5 in (select data1 from smalltable1 st);

100 rows selected.

SQL> select * from largetable lt where exists (select 1 from smalltable1 st where st.data1=lt.data5);

100 rows selected.
Inference

From the preceding example, it can be inferred that usage of IN and EXISTS provided us with the same results for both Small table to large table and large table to small table.

Scenario 3: Large table to Large table

Now, the last scenario is to compare large table with another large table and observe the results. Let us create another table largetable2 and compare the results.

SQL> select * from largetable where data5 in (select data5 from largetable2);

100000 rows selected.
SQL> select * from largetable lt where exists (select 1 from largetable2 lt2 where lt.data5=lt2.data5);

100000 rows selected.

To Conclude:

In all the three scenarios, except for the negligible difference in statistics, it can be observed that both IN and EXISTS behave equally.

Difference Between In and Exists

Database Version: Oracle 10g
Operating System: Windows XP

I was actually under the impression that there is actually a good difference between IN and EXISTS. But to my surprise, I found something interesting and further digged into the details and found some good facts. I had a wrong conception, which got cleared after the experiment. I actually wanted to share the same with you.

Till Oracle 8i, IN and Exists were processed in different way by Oracle. However from Oracle 9i, predominantly in Oracle 10g they are much processed on the same way, thanks to CBO intelligent optimizer.

For the illustration purpose I have created four tables, two small and two big as follows. Now I will compare small to small, small to big, big to big using both IN and EXISTS and see what are the results.

Scenario 1: Small table to Small table

Table

          create table smalltable1 (data1 number(3), data2 varchar2(10), data3 varchar2(10), data4 varchar2(10));

Procedure

          create or replace procedure smalltableproc is
          counter number(3);
          begin
                   counter := 1;
                   while (counter <= 100)
                   loop   
                             insert into smalltable1 values (counter,'aa' || counter,'bb' || counter, 'cc' || counter);
                             counter := counter + 1;
                   end loop;
                   commit;
          end;

Index

          create index smalltable1data1idx on smalltable1(data1);
         
Procedure Execution

          exec smalltableproc;

Now, I will create the second small table from the first small table as follows:

          create table smalltable2 as select * from smalltable1;
          create index smalltabled2ata1idx on smalltable2(data1);

Before comparing the results, in general I used to update the statistics for both indexes and tables. Even though, creating the indexes will actually build the statistics, I always make it a point to do this.

SQL> exec DBMS_STATS.gather_table_stats('INXQA','SMALLTABLE1');

PL/SQL procedure successfully completed.

SQL> exec DBMS_STATS.gather_table_stats('INXQA','SMALLTABLE2');

PL/SQL procedure successfully completed.

Now let us see the results between the usage of IN and EXISTS

SQL> select * from smalltable1 where data1 in (select data1 from smalltable2);

100 rows selected.

SQL> select * from smalltable1 ss where exists (select 1 from smalltable2 st where ss.data1=st.data1);

100 rows selected.

Inference
It can be inferred from the above example that both IN and EXISTS provided us with the same results for the smaller tables.
Now, let us take the next comparison of Small table to Big Table and Vice versa.

Scenario 2: Small table to Large table and Large Table to Small Table

Table

          create table largetable (data5 number(6), data6 varchar2(10), data7 varchar2(10), data8 varchar2(10));

Procedure

          create or replace procedure largetableproc is
          counter number(6);
          begin
                   counter := 1;
                   while (counter <= 100000)
                   loop
                             insert into largetable values (counter,'aa' || counter,'bb' || counter, 'cc' || counter);
                             counter := counter + 1;
                   end loop;
                   commit;
          end;

Index

          create index largetabledata5idx on largetable(data5);

Procedure Execution

          SQL> exec largetableproc

          PL/SQL procedure successfully completed.

Statistics

          SQL> exec DBMS_STATS.gather_table_stats('HARI','LARGETABLE');

          PL/SQL procedure successfully completed.


Sunday, November 28, 2010

Bitmap Indexes - Part 2

Case 1: Using BITMAP for BULK Load.

To illustrate this, I have created a table called bitmapdemotable as provided below:

    create table bitmapdemotable (runnumber number(6), runname varchar2(15), runtype number(2));

I have also created a procedure to populate this table. Please note that I am using a "While Loop" here to simulate row-by-row insert operation.

    create or replace procedure bitmapproc(noofrecords in number) is
    counter number;
    begin
        counter := 1;
        while (counter <= noofrecords)
        loop
            insert into bitmapdemotable values (counter, 'a' || counter,mod(counter,2));
            counter := counter + 1;
        end loop;
        commit;
    end;

Next, I have created an index on this table for the column "runtype" which actually holds either 0 or 1.

    create bitmap index bitmapidx on bitmapdemotable(runtype);

In order to study the behaviour, we need to use TKPROF. For this we need to change the configuration settings of TIMED_STATISTICS and SQL_TRACE

    alter session set timed_statistics=true;
    alter session set sql_trace=true;

Now, I am executing the procedure by passing the value as 50000.

    exec bitmapproc(50000)

When we see the trace file (after extracting the contents of the trace file into a local text file using TKPROF), it looks like the following figure:

The initial is for the procedure as a whole (does not include the data for the insert statement as it is separately captured) and the second is for the insert statement. Please note that the insert statement took more CPU, elapsed time and logical I/Os (both query and current). Now, please also note that ELAPSED is greater than CPU, which may be due to some wait events.

Now, instead of row-by-row insert we have captured the statistics for the bulk load and the result is shown in the following figure:
As expected, the SELECT statement, led to FULL TABLE SCAN. But, when we look at the  statistics, we can observe vast difference between the row-by-row and BULK load. The reason being, when we do a row-by-row insert, the B-tree is used to locate the leaf nodes that contain bitmap segments for a given value of the key. Start ROWID and the bitmap segments are used to locate the rows that contain the key value. When changes are made to the key column in the table, bitmaps must be modified. This results in the locking of the relevant bitmap segments. Because locks are acquired on the whole bitmap segment, a row that is covered by the bitmap cannot be updated by other transactions until the first transaction ends. Oracle server should repeat this for every row that gets either inserted or updated. Hence row by row manipulation on the rows consists of bitmap index is costly.

Bitmap Indexes - Part 4

SQL> select * from bitmapdemotable where runtype=1;
125000 rows selected.
SQL> select * from btreedemotable where runtype=1;
125000 rows selected.
This states that as far as the query is concerned, there is no difference between the query results on the tables that actually uses btree index and bitmap index. We have considered the scenario where we have two distinct values for run type. Now, let us create the similar type of indexes in another column in the same table and check. Also, we will drop the existing indexes on the runtype column.

drop index btreeidx;
drop index bitmapidx;

To create the indexes on runname column:

create index runnamebtree on btreedemotable(runname);
create bitmap index runnamebitmap on bitmapdemotable(runname);

exec DBMS_STATS.gather_index_stats('HARI', 'RUNNAMEBTREE');
exec DBMS_STATS.gather_index_stats('HARI', 'RUNNAMEBITMAP');

SQL> select * from bitmapdemotable where runname='a1000';
SQL> select * from btreedemotable where runname='a1000';
I actually tried to insert into the tables (with index on runname) and found the following:

SQL> insert into btreedemotable select * from btreedemotable;
250000 rows created.
SQL> insert into bitmapdemotable select * from bitmapdemotable;
250000 rows created.
Please notice the huge difference between the tables having different indexes on the similar column.

To conclude:
a)     There is no much difference between b*tree and bitmap as far as the query execution is concerned. Both are exhibiting similar statistics
b)     The difference comes when we try to perform DML operations on the indexes.

Bitmap Indexes - Part 3

Case 2: Comparison between B*Tree and Bitmap

Let us illustrate this point by identifying when a B*Tree can be used and when we should use Bitmap. Let us consider the same example of what we have done it in case 1 and see if there is any difference when we use b*tree instead of bitmap

Let us create a table for creating a b*tree index, as follows:

   create table btreedemotable as select * from bitmapdemotable where 1=0;
   create index btreeidx on btreedemotable(runtype);

Let us create the same procedure, but to populate btreedemotable.

create or replace procedure btreeproc(noofrecords in number) is
counter number;
begin
        counter := 1;
        while (counter <= noofrecords)
        loop
            insert into btreedemotable values (counter, 'a' || counter,mod(counter,2));
            counter := counter + 1;
        end loop;
        commit;
end;

alter session set timed_statistics=true;
alter session set sql_trace=true;
Please note that the row-by-row insert for a b*tree index based table works like a champ as compared to bitmap index based table. Now, we will try to select it from the table and see if there are any differences:

SQL> select * from btreedemotable where runtype=1;
25000 rows selected.
SQL> select * from bitmapdemotable where runtype=1;
25000 rows selected.
Except for the minor changes in the statistics and cost, there seems to be no big difference between the two tables having different index types. Now, let us check for the query that returns 125,000 records.

Bitmap Indexes



Database Version: Oracle 10g
Operating System: Windows XP

I was working on tuning a procedure in my current company, when I found something interesting with my colleague. He has created a table with a bitmap index for one of the columns. The insert into the table was found to be very slow and when I checked in the process, he is actually doing a single row operation (of around 50K every day) which is actually taking couple of minutes to insert. There was actually an argument with him and he claims that the slow in insert operation is not due to bitmap index and it is because of something else. Further he claimed that BITMAP index is similar in B*Tree index. Not only him, many could not understand when we should create BITMAP INDEX and what actually it is.

In this article, I will explain what is BITMAP Index, their operational difference (or comparison) and when do we need to create BITMAP indexes. As always said, please do provide your comments.

What is BITMAP index?

Oracle created BITMAP index to cater the need for data warehousing and adhoc query mechanism. In general, usage of bitmap index in OLTP is not a highly advisable solution. In a bitmap index, a bitmap for each key value replaces a list of rowids. Each bit in the bitmap corresponds to a possible rowid, and if the bit is set, it means that the row with the corresponding rowid contains the key value. Oracle has created a proprietary mapping function that converts the bit position to an actual rowid, so that the bitmap index provides the same functionality as a regular index. Bitmap indexes store the bitmaps in a compressed way.

Bitmap indexes are most appropriate on low distinct cardinality data (i.e., data with relatively few discrete values when compared to the cardinality of the entire set. It is difficult to define what low distinct cardinality is?. This is data where the number of distinct items in the set of rows divided by the number of rows is a small number (near zero). For example, in a table of 10000 rows, 1 or 2 distinct valued column would be appropriate for a bitmap index. However, in a table of 1000000 rows, 100 or 200 distinct valued column would be appropriate for a bitmap index. I have seen people telling me that columns with more than 5 or 10 distinct columns cannot participate in a bitmap index, which is actually wrong.

In general, when the data is packed for bitmap index, it is not packed like b*tree index. In case of b*tree index, the leaf node will have a list of rowids. In case of bitmap index, the leaf node contains bitmap, with the value set if the key value is available and not set if not. Each leaf node contains the entry header, key, start rowid, end rowid and bitmap segment. A bitmap segment consisting of a string of bits. The bit is set when the corresponding row contains the key value and is unset when the row does not contain the key value. Entry header, which contains the information on the number of columns and lock information. The Oracle server uses a patented compression technique to store bitmap segments and also bitmap Index uses restricted rowid format

Difference between Bitmap and B*Tree Indexes

Bitmap indexes actually store pointers to many rows with a single index key entry, as compared to a B*Tree. In B*Tree index, we will be able to pair the index keys and the corresponding rows in a table. In a bitmap index, there will be a very small number of index entries, each of which points to many rows. In a conventional B*Tree, one index entry points to a single row.

When do we need BITMAP index?


We need BITMAP index for data warehousing application, Adhoc Query Processing, and when the data load is quiet large(and we have bulk processing). We should never use a BITMAP index for OLTP processing. We shall consider couple of scenarios where in we check on the usage of BITMAP index and study the response. On the first case, I will show what happens if we use BITMAP index for row-by-row insert and on the second case, we shall see the operational difference between b*tree index and Bitmap Index.

Part 2     Part 3   Part 4

Friday, October 8, 2010

Facts on Indexes

Author:Hariharan T Sairam. My Best Friend :)

In a sequence of my learning and experiments, I found some interesting facts on Indexes, which I thought I can provide as an article.

Some of the most celebrated myths are:

a) Difference in creation of Indexes
b) Full table scan is evil
c) Rebuilding Index leads to faster retrieval of records

Myth 1: Difference in creation of Indexes

In this section, we shall consider two cases viz Indexes on individual columns and composite index.

Case 1: Indexes on individual columns

Let us take an example to illustrate the creation of indexes.

We first create a table called GENERALWORKS, with the following columns:

create table generalworks
(
GENNO NUMBER(7),
GENNAME VARCHAR2(40),
GENCATG VARCHAR2(2),
GENDESC VARCHAR2(3),
GENDETS VARCHAR2(6)
)

Also, for this table, let us create three indexes, one on each column as follows:

create index GENERALWORKSGENCATGIDX on generalworks(gencatg);
create index GENERALWORKSGENNMEIDX on generalworks(genname);
create index GENERALWORKSGENIDX on generalworks(genno);

Let me also insert 50,000 records into this table.

SQL> select count(*) from generalworks;

COUNT(*)
----------
50000

To be on the safer side, let me also gather the statistics for the table and indexes as follows:

SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'GENERALWORKS');

PL/SQL procedure successfully completed.

SQL> exec DBMS_STATS.gather_index_stats('HARI', 'GENERALWORKSGENCATGIDX');

PL/SQL procedure successfully completed.

SQL> exec DBMS_STATS.gather_index_stats('HARI', 'GENERALWORKSGENNMEIDX');

PL/SQL procedure successfully completed.

SQL> exec DBMS_STATS.gather_index_stats('HARI', 'GENERALWORKSGENIDX');

PL/SQL procedure successfully completed.

Scenario 1:

We will now retrieve the records based on the columns GENNAME and GENCATG

SQL> select * from generalworks where genname='aaaabbbb' and gencatg='ab';

3847 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4242548220
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 298 | 8046 | 99 (2)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID | GENERALWORKS | 298 | 8046 | 99 (2)| 00:00:02 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 3 | BITMAP AND | | | | | |
| 4 | BITMAP CONVERSION FROM ROWIDS | | | | | |
|* 5 | INDEX RANGE SCAN | GENERALWORKSGENCATGIDX | 3860 | | 12 (0)| 00:00:01 |
| 6 | BITMAP CONVERSION FROM ROWIDS | | | | | |
|* 7 | INDEX RANGE SCAN | GENERALWORKSGENNMEIDX | 3860 | | 16 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("GENCATG"='ab')
7 - access("GENNAME"='aaaabbbb')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
571 consistent gets
0 physical reads
0 redo size
160080 bytes sent via SQL*Net to client
3197 bytes received via SQL*Net from client
258 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3847 rows processed

From the above execution plan, it is quiet evident that the indexes are getting used and the cost is also less to retrieve 3,847 records. But, wait a minute, something different is seen in the plan. BITMAP? We have created only B*Tree indexes and we are seeing BITMAP in the plan? Reason being, in order to retrieve the data back faster, Oracle decided to implement the mechanism of creating a BITMAP INDEX on the fly. This is done only for certain types of query which will get benefited by this. Otherwise, No.

Also note that the execution plan shows 298 rows for the Step 1 and 0, but actual number processed was 3847. The reason being, execution plan is always an estimate and need not always be similar to the actual plan. Also, there are queries where the actual plan will never coincide with the estimated plan.

Scenario 2:

SQL> select * from generalworks where genname='aaaabbbb';

3847 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1336713796
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows| Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3860 | 101K|245 (1)| 00:00:03 |
| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 3860 | 101K|245 (1)| 00:00:03 |
|* 2 | INDEX RANGE SCAN | GENERALWORKSGENNMEIDX | 3860 | | 16 (0) | 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
-----------------------------------------------------------

2 - access("GENNAME"='aaaabbbb')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
785 consistent gets
0 physical reads
0 redo size
160080 bytes sent via SQL*Net to client
3197 bytes received via SQL*Net from client
258 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3847 rows processed

Scenario 3:

SQL> select * from generalworks where genno > 10000 and genname='aaaabbbb';

3077 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1336713796
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows |Bytes|Cost (%CPU)|Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3090 | 83430 |245 (1)| 00:00:03 |
|* 1 | TABLE ACCESS BY INDEX ROWID | GENERALWORKS | 3090 | 83430 |245 (1)| 00:00:03 |
|* 2 | INDEX RANGE SCAN | GENERALWORKSGENNMEIDX | 3860 | | 16 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("GENNO">10000)
2 - access("GENNAME"='aaaabbbb')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
685 consistent gets
0 physical reads
0 redo size
90912 bytes sent via SQL*Net to client
2636 bytes received via SQL*Net from client
207 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3077 rows processed

Scenario 4:

SQL> select * from generalworks where genno > 10000 and genname='aaaabbbb' and gencatg='ab

3077 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4242548220
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time|
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 239 | 6453 | 99 (2)| 00:00:02 |
|* 1 | TABLE ACCESS BY INDEX ROWID | GENERALWORKS | 239 | 6453 | 99 (2)| 00:00:02 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 3 | BITMAP AND | | | | | |
| 4 | BITMAP CONVERSION FROM ROWIDS | | | | | |
|* 5 | INDEX RANGE SCAN | GENERALWORKSGENCATGIDX | 3860 | 12 (0)| 00:00:01 |
| 6 | BITMAP CONVERSION FROM ROWIDS| | | | | |
|* 7 | INDEX RANGE SCAN | GENERALWORKSGENNMEIDX | 3860 || 16 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("GENNO">10000)
5 - access("GENCATG"='ab')
7 - access("GENNAME"='aaaabbbb')

Statistics
----------------------------------------------------------
161 recursive calls
0 db block gets
555 consistent gets
0 physical reads
0 redo size
90912 bytes sent via SQL*Net to client
2636 bytes received via SQL*Net from client
207 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
3077 rows processed

Here again, the BITMAP conversion is used with the number of rows not tallying with the actual processed.

Scenario 5:

SQL> select * from generalworks where gencatg='ab';

3847 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 418290240
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3906 | 102K| 239 (1)| 00:00:03 |
| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 3906 | 102K| 239 (1)| 00:00:03 |
|* 2 | INDEX RANGE SCAN | GENERALWORKSGENCATGIDX | 3906 | | 8 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("GENCATG"='ab')

Statistics
----------------------------------------------------------
203 recursive calls
0 db block gets
773 consistent gets
12 physical reads
0 redo size
160080 bytes sent via SQL*Net to client
3197 bytes received via SQL*Net from client
258 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
3847 rows processed

Scenario 6:

SQL> select * from generalworks where genno > 40000;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2559079067
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost(%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9999 | 263K| 77 (2)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | GENERALWORKS | 9999 | 263K| 77 (2)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | GENERALWORKSGENIDX | 9999 | | 23 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("GENNO">40000)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1406 consistent gets
24 physical reads
0 redo size
417145 bytes sent via SQL*Net to client
7707 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10000 rows processed




Note:
If we replace the condition “genno > 40000” with “genno > 10000” we will incur FULL TABLE SCAN as the number of records to be selected is 40,000.

Case 2: Composite Indexes

Instead of creating individual indexes for the 3 columns, let us drop the indexes and create a composite index (as below) and study its execution plan.

create index genworkscompindx on generalworks(genno,genname,gencatg);

In this, first position is GENNO, 2nd position is GENNAME and 3rd is GENCATG. Reason for creating this composite index is to analyze and understand the difference in creating indexes in these two different forms. Now, we shall analyze all the scenarios that we have tested and study the implications

Scenario 1

SQL> select * from generalworks where genname='aaaabbbb' and gencatg='ab';

3847 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 857713684
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 298 | 8046 | 280 (2)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| GENERALWORKS | 298 | 8046 | 280 (2)| 00:00:04 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("GENNAME"='aaaabbbb' AND "GENCATG"='ab')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1514 consistent gets
0 physical reads
0 redo size
112736 bytes sent via SQL*Net to client
3197 bytes received via SQL*Net from client
258 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3847 rows processed

Please note that the parameter “consistent gets” is high and this query is undergoing full table scan. Also, the select statement took two seconds more than the first statement. We now need to identify the reason for the table scan (even though table scan is not evil). On the surface level, we might say that the columns GENNAME and GENCATG are in positions 2 and 3 and hence the index is not been used properly. If we execute the following statement

SELECT INDEX_NAME, COLUMN_NAME, NUM_ROWS, BLEVEL, LEAF_BLOCKS, CLUSTERING_FACTOR FROM USER_INDEXES UI, USER_IND_COLUMNS UIC WHERE UI.INDEX_NAME=UIC.INDEX_NAME AND UI.TABLE_NAME=’GENERALWORKS’;

We observe that the CLUSTERING_FACTOR to be 264 and LEAF_BLOCKS to be 194, which actually shows that the indexes will be used as the CLUSTERING_FACTOR is almost close to LEAF_BLOCKS. However, the index will be used only as a whole (i.e.) GENNO, GENNAME and GENCATG (any combination of this). If we carefully study the rest of the scenarios, most will lead to table scan. Indexes will be used when we use all the columns that are indexed in the where clause.

Scenario 2:

SQL> select * from generalworks where genname='aaaabbbb';

3847 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 857713684
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3860 | 101K| 279 (2)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| GENERALWORKS | 3860 | 101K| 279 (2)| 00:00:04 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("GENNAME"='aaaabbbb')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1514 consistent gets
0 physical reads
0 redo size
112736 bytes sent via SQL*Net to client
3197 bytes received via SQL*Net from client
258 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3847 rows processed

Please note that the value for consistent gets is high (almost doubled). Also we can observe that the query is undergoing table scan.

Scenario 3:

SQL> select * from generalworks where genno > 10000 and genname='aaaabbbb';

3077 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1452202321
----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3125 | 84375 | 176 (2)| 00:00:03 |
| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 3125 | 84375 | 176 (2)| 00:00:03 |
|* 2 | INDEX RANGE SCAN | GENWORKSCOMPINDX | 3125 | | 159 (2)| 00:00:02 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("GENNO">10000 AND "GENNAME"='aaaabbbb' AND "GENNO" IS NOT NULL)
filter("GENNAME"='aaaabbbb')

Statistics
----------------------------------------------------------
386 recursive calls
0 db block gets
806 consistent gets
376 physical reads
0 redo size
128826 bytes sent via SQL*Net to client
2636 bytes received via SQL*Net from client
207 SQL*Net roundtrips to/from client
8 sorts (memory)
0 sorts (disk)
3077 rows processed
There is something quite interesting happening in this query. Cost has come down from 245 to 176 for access by ROWID. However, the cost of the index range scan has increased drastically from 16 to 159. Here, the cost specified is for the whole index GENWORKSCOMPINDX, which comprises 3 columns and not a single column. Hence the cost is based on the whole index and not for an individual column. Now, if we drop this index and create an index for individual columns, the cost will be less as it will be based on a single column.

Scenario 4:

SQL> select * from generalworks where genno > 10000 and genname='aaaabbbb' and gencatg='ab';

3077 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1452202321
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 244 | 6588 | 161 (2)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 244 | 6588 | 161 (2)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | GENWORKSCOMPINDX | 244 | | 159 (2)| 00:00:02 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("GENNO">10000 AND "GENNAME"='aaaabbbb' AND "GENCATG"='ab' AND "GEN
NO" IS NOT NULL) filter("GENNAME"='aaaabbbb' AND "GENCATG"='ab')

Statistics
----------------------------------------------------------
141 recursive calls
0 db block gets
769 consistent gets
0 physical reads
0 redo size
128826 bytes sent via SQL*Net to client
2636 bytes received via SQL*Net from client
207 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
3077 rows processed

Here again, we could see an increase in the cost for the range scan.

Scenario 5:

SQL> select * from generalworks where gencatg='ab';

3847 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 857713684
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3906 | 102K| 280 (2)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| GENERALWORKS | 3906 | 102K| 280 (2)| 00:00:04 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("GENCATG"='ab')

Statistics
----------------------------------------------------------
386 recursive calls
0 db block gets
1577 consistent gets
1257 physical reads
0 redo size
112736 bytes sent via SQL*Net to client
3197 bytes received via SQL*Net from client
258 SQL*Net roundtrips to/from client
8 sorts (memory)
0 sorts (disk)
3847 rows processed

Here again, there is a huge increase in “Consistent Gets” and “Physical Reads” and the query is undergoing a table scan. This is because we have used a column in the where clause which is in the third position of the index and does not have statistics of its own. Hence, when this query is executed, the statistics by optimizer for this query will be of no use and is not used to arrive at the proper execution plan.

Scenario 6:

SQL> select * from generalworks where genno > 40000;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1452202321
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9999 | 263K| 94 (2)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 9999 | 263K| 94 (2)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | GENWORKSCOMPINDX | 9999 | | 40 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("GENNO">40000 AND "GENNO" IS NOT NULL)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1421 consistent gets
41 physical reads
0 redo size
417145 bytes sent via SQL*Net to client
7707 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10000 rows processed

Since the first column in the index is used in the where clause, the optimizer is able to arrive at the proper execution plan using the column. Hence index path is chosen to execute this query.

Scenario 7:

SQL> select * from generalworks where gencatg='ab' and genname='aaaabbbb' and genno > 10000;

3077 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1452202321
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 244 | 6588 | 161 (2)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 244 | 6588 | 161 (2)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | GENWORKSCOMPINDX | 244 | | 159 (2)| 00:00:02 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("GENNO">10000 AND "GENNAME"='aaaabbbb' AND "GENCATG"='ab' AND "GENNO" IS NOT NULL)
filter("GENCATG"='ab' AND "GENNAME"='aaaabbbb')

Statistics
----------------------------------------------------------
183 recursive calls
0 db block gets
773 consistent gets
0 physical reads
0 redo size
128826 bytes sent via SQL*Net to client
2636 bytes received via SQL*Net from client
207 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
3077 rows processed

The above scenario is provided to illustrate an important point. If an index is create on three columns viz Column a, Column b and Column c, then these three columns can be used in any order (i.e.)

Where Column a = ‘’ or Column b=’’ and Column c=’’
Where Column b = ‘’ or Column c=’’ and Column a=’’
Where Column c = ‘’ or Column a=’’ and Column b=’’
Where Column a = ‘’ or Column c=’’ and Column b=’’
Where Column b = ‘’ or Column a=’’ and Column c=’’
Where Column c = ‘’ or Column b=’’ and Column a=’’

All the above combination will yield the index usage and should not create any major difference in the level of execution.

Conclusion:

If the columns are not always used together, please do create indexes on separate columns. If the columns are always used together, please do create composite indexes. Please also note that creation of index depends on the use case.

Myth 2: Full Table Scan is evil

Many think that full table scan is evil and they try to create indexes or use hints to convert this full table scan into other types of scan where indexes can be used. However, it is not so. Full table scan is never an evil way of executing the query. The reason for the optimizer to decide on the full table scan is probably due to low execution cost. From Oracle documentation, the following on Cost Based Optimizer (CBO) is known:
The CBO determines which execution plan is most efficient by considering available access paths and by factoring in information based on statistics for the schema objects (tables or indexes) accessed by the SQL statement. The CBO performs the following steps:
1. The optimizer generates a set of potential plans for the SQL statement based on available access paths and hints.
2. The optimizer estimates the cost of each plan based on statistics in the data dictionary for the data distribution and storage characteristics of the tables, indexes, and partitions accessed by the statement.
3. The optimizer compares the costs of the plans and chooses the one with the lowest cost.
Let us take the example using the tables created in Myth 1 and see what happens if we induce the index. The following statement is without using an index hint and hence it goes for the full table scan.

SQL> select * from generalworks where genno > 10000;

40000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 857713684
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40000 | 1054K| 280 (2)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| GENERALWORKS | 40000 | 1054K| 280 (2)| 00:00:04 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("GENNO">10000)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3913 consistent gets
0 physical reads
0 redo size
1506899 bytes sent via SQL*Net to client
29707 bytes received via SQL*Net from client
2668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
40000 rows processed

Now, we shall use the index hint so that the optimizer will use this index to execute the query.

SQL> select /*+ index(generalworks genworkscompindx) */ * from generalworks where genno > 10000;

40000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1452202321

------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40000 | 1054K| 372 (1)| 00:00:05 |
| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 40000 | 1054K| 372 (1)| 00:00:05 |
|* 2 | INDEX RANGE SCAN | GENWORKSCOMPINDX | 40000 | | 159 (2)| 00:00:02 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("GENNO">10000 AND "GENNO" IS NOT NULL)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5685 consistent gets
117 physical reads
0 redo size
1666842 bytes sent via SQL*Net to client
29707 bytes received via SQL*Net from client
2668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
40000 rows processed

Now, we have successfully made this query to execute by using the index. However, the following is to be noted:

a) Cost is increased
b) Consistent gets is increased
c) Physical Reads increased

Reason for this is, the optimizer when it compared the cost of executing the queries for various options, it found that using index for this is a costly affair. Hence it chose the full table scan. But when we induced the index usage, the execution plan is observed to be costly.

Having said this, the following are also to be noted:

a) If the cost is less after inducing the index, it means that proper statistics is not fed into the optimizer. Please make sure that all necessary statistics are updated in the tables.
b) Table scan in some occasions may prove to be an evil, if we do not perform the normal ground work like creating appropriate indexes, statistics collection and proper joins.

Myth 3: Rebuilding Index leads to faster retrieval of records.

Rebuilding an index would to some extent decrease the execution time of a select query, but increases the execution time of the DML operation. Please note that I said some extent and not to a greater extent. When we rebuild an index, it tightly compacts the index in such a way that most of the indexes almost lie near to each other there on the query operations completes a bit faster. However, this will also affect the involved DML operations. Let us illustrate this with an example.

Let me create a table, insert some data into it and study the corresponding statistics. After that we will rebuild the involved indexes and study the corresponding statistics and compare the same.

SQL> create table rebuildcheck (firstcol number);

Table created.

SQL> create index rebuildindex on rebuildcheck(firstcol);

Index created.

I created a small procedure to insert the data into this table as follows:

create or replace procedure rebuildproc as
begin
for i in 1 .. 400000
loop
insert into rebuildcheck values (trunc(dbms_random.value(1,500000)));
end loop;
commit;
end;

In order to study the behaviour of this insert operation, I turned on the SQL trace as follows:

SQL> ALTER SYSTEM SET TIMED_STATISTICS = TRUE;

System altered.

SQL> alter session set sql_trace=true;

Session altered.

SQL> exec rebuildproc

PL/SQL procedure successfully completed.

SQL> alter session set sql_trace=false;

Session altered.

Now in my command prompt, I converted the generated trace file into a text file using TKPROF utility as follows:


tkprof D:\oracle\product\10.2.0\admin\orcl\udump\
orcl_ora_3820.trc d:\oracleexperiments\rebuildtrace.txt explain=hari/hari

The result shows the execution details of this procedure as follows:

BEGIN rebuildproc; END;

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 48.17 51.25 0 0 0 1
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 48.18 51.25 0 0 0 1

CPU is 48 seconds with the elapsed time of 51 seconds

Now, let us also select from this table and see the execution plan

SQL> select * from rebuildcheck;

400000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3132651637
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 378K| 4803K| 149 (8)| 00:00:02 |
| 1 | TABLE ACCESS FULL| REBUILDCHECK | 378K| 4803K| 149 (8)| 00:00:02 |
----------------------------------------------------------------------------------

Note
-----
- dynamic sampling used for this statement

Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
27359 consistent gets
0 physical reads
0 redo size
7454987 bytes sent via SQL*Net to client
293707 bytes received via SQL*Net from client
26668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
400000 rows processed

Please note that even if we induce index the cost remains more or less the same as shown below:

SQL> select /*+ index(rebuildcheck rebuildindex) */ * from rebuildcheck;

400000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3132651637
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 345K| 4384K| 148 (7)| 00:00:02 |
| 1 | TABLE ACCESS FULL| REBUILDCHECK | 345K| 4384K| 148 (7)| 00:00:02 |
----------------------------------------------------------------------------------

Note
-----
- dynamic sampling used for this statement

Statistics
----------------------------------------------------------
205 recursive calls
1 db block gets
27392 consistent gets
0 physical reads
176 redo size
5854855 bytes sent via SQL*Net to client
293707 bytes received via SQL*Net from client
26668 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
400000 rows processed

Now, let us rebuild the index and see what happens:

SQL> alter index rebuildindex rebuild;

Index altered.

SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'REBUILDCHECK');

PL/SQL procedure successfully completed.

SQL> EXEC DBMS_STATS.gather_index_stats('HARI', 'REBUILDINDEX');

PL/SQL procedure successfully completed.

SQL> select * from rebuildcheck;

400000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3132651637
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 378K| 4803K| 149 (8)| 00:00:02 |
| 1 | TABLE ACCESS FULL| REBUILDCHECK | 378K| 4803K| 149 (8)| 00:00:02 |
----------------------------------------------------------------------------------

Note
-----
- dynamic sampling used for this statement

Statistics
----------------------------------------------------------
50 recursive calls
0 db block gets
27365 consistent gets
0 physical reads
0 redo size
7454987 bytes sent via SQL*Net to client
293707 bytes received via SQL*Net from client
26668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
400000 rows processed



SQL> select /*+ index(rebuildcheck rebuildindex) */ * from rebuildcheck;

400000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3132651637
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 400K| 1562K| 149 (8)| 00:00:02 |
| 1 | TABLE ACCESS FULL| REBUILDCHECK | 400K| 1562K| 149 (8)| 00:00:02 |
----------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
27293 consistent gets
0 physical reads
0 redo size
5854855 bytes sent via SQL*Net to client
293707 bytes received via SQL*Net from client
26668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
400000 rows processed

BEGIN rebuildproc; END;

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.03 0.00 0 0 0 0
Execute 2 101.79 108.58 0 0 0 2
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 101.82 108.59 0 0 0 2

Misses in library cache during parse: 2
Optimizer mode: ALL_ROWS
Parsing user id: 62 (HARI)

Please note that CPU is increased 102 seconds with the elapsed time of 109 seconds. Almost 54 seconds more than expected with 58 seconds more on elapsed time. Also, note that not much improvement is noticed on the query execution.

The argument on using the rebuild index option still continues in this database world. One side of experts advise not to use rebuild option at all while the other end of experts advise to use the same. If we use rebuild, the issue is on using the DML statements.