Next generation strategy and solutions: October 2010

Friday, October 8, 2010

Facts on Indexes

Author:Hariharan T Sairam. My Best Friend :)

In a sequence of my learning and experiments, I found some interesting facts on Indexes, which I thought I can provide as an article.

Some of the most celebrated myths are:

a) Difference in creation of Indexes

b) Full table scan is evil

c) Rebuilding Index leads to faster retrieval of records

Myth 1: Difference in creation of Indexes

In this section, we shall consider two cases viz Indexes on individual columns and composite index.

Case 1: Indexes on individual columns

Let us take an example to illustrate the creation of indexes.

We first create a table called GENERALWORKS, with the following columns:

create table generalworks

(

GENNO NUMBER(7),

GENNAME VARCHAR2(40),

GENCATG VARCHAR2(2),

GENDESC VARCHAR2(3),

GENDETS VARCHAR2(6)

)

Also, for this table, let us create three indexes, one on each column as follows:

create index GENERALWORKSGENCATGIDX on generalworks(gencatg);

create index GENERALWORKSGENNMEIDX on generalworks(genname);

create index GENERALWORKSGENIDX on generalworks(genno);

Let me also insert 50,000 records into this table.

SQL> select count(*) from generalworks;

COUNT(*)

----------

50000

To be on the safer side, let me also gather the statistics for the table and indexes as follows:

SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'GENERALWORKS');

PL/SQL procedure successfully completed.

SQL> exec DBMS_STATS.gather_index_stats('HARI', 'GENERALWORKSGENCATGIDX');

PL/SQL procedure successfully completed.

SQL> exec DBMS_STATS.gather_index_stats('HARI', 'GENERALWORKSGENNMEIDX');

PL/SQL procedure successfully completed.

SQL> exec DBMS_STATS.gather_index_stats('HARI', 'GENERALWORKSGENIDX');

PL/SQL procedure successfully completed.

Scenario 1:

We will now retrieve the records based on the columns GENNAME and GENCATG

SQL> select * from generalworks where genname='aaaabbbb' and gencatg='ab';

3847 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 4242548220

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 298 | 8046 | 99 (2)| 00:00:02 |

| 1 | TABLE ACCESS BY INDEX ROWID | GENERALWORKS | 298 | 8046 | 99 (2)| 00:00:02 |

| 2 | BITMAP CONVERSION TO ROWIDS | | | | | |

| 3 | BITMAP AND | | | | | |

| 4 | BITMAP CONVERSION FROM ROWIDS | | | | | |

|* 5 | INDEX RANGE SCAN | GENERALWORKSGENCATGIDX | 3860 | | 12 (0)| 00:00:01 |

| 6 | BITMAP CONVERSION FROM ROWIDS | | | | | |

|* 7 | INDEX RANGE SCAN | GENERALWORKSGENNMEIDX | 3860 | | 16 (0)| 00:00:01 |

-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

5 - access("GENCATG"='ab')

7 - access("GENNAME"='aaaabbbb')

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

571 consistent gets

0 physical reads

0 redo size

160080 bytes sent via SQL*Net to client

3197 bytes received via SQL*Net from client

258 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

3847 rows processed

From the above execution plan, it is quiet evident that the indexes are getting used and the cost is also less to retrieve 3,847 records. But, wait a minute, something different is seen in the plan. BITMAP? We have created only B*Tree indexes and we are seeing BITMAP in the plan? Reason being, in order to retrieve the data back faster, Oracle decided to implement the mechanism of creating a BITMAP INDEX on the fly. This is done only for certain types of query which will get benefited by this. Otherwise, No.

Also note that the execution plan shows 298 rows for the Step 1 and 0, but actual number processed was 3847. The reason being, execution plan is always an estimate and need not always be similar to the actual plan. Also, there are queries where the actual plan will never coincide with the estimated plan.

Scenario 2:

SQL> select * from generalworks where genname='aaaabbbb';

3847 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1336713796

-----------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 3860 | 101K|245 (1)| 00:00:03 |

| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 3860 | 101K|245 (1)| 00:00:03 |

|* 2 | INDEX RANGE SCAN | GENERALWORKSGENNMEIDX | 3860 | | 16 (0) | 00:00:01 |

-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

-----------------------------------------------------------

2 - access("GENNAME"='aaaabbbb')

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

785 consistent gets

0 physical reads

0 redo size

160080 bytes sent via SQL*Net to client

3197 bytes received via SQL*Net from client

258 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

3847 rows processed

Scenario 3:

SQL> select * from generalworks where genno > 10000 and genname='aaaabbbb';

3077 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1336713796

-----------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 3090 | 83430 |245 (1)| 00:00:03 |

|* 1 | TABLE ACCESS BY INDEX ROWID | GENERALWORKS | 3090 | 83430 |245 (1)| 00:00:03 |

|* 2 | INDEX RANGE SCAN | GENERALWORKSGENNMEIDX | 3860 | | 16 (0)| 00:00:01 |

-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("GENNO">10000)

2 - access("GENNAME"='aaaabbbb')

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

685 consistent gets

0 physical reads

0 redo size

90912 bytes sent via SQL*Net to client

2636 bytes received via SQL*Net from client

207 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

3077 rows processed

Scenario 4:

SQL> select * from generalworks where genno > 10000 and genname='aaaabbbb' and gencatg='ab

3077 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 4242548220

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 239 | 6453 | 99 (2)| 00:00:02 |

|* 1 | TABLE ACCESS BY INDEX ROWID | GENERALWORKS | 239 | 6453 | 99 (2)| 00:00:02 |

| 2 | BITMAP CONVERSION TO ROWIDS | | | | | |

| 3 | BITMAP AND | | | | | |

| 4 | BITMAP CONVERSION FROM ROWIDS | | | | | |

|* 5 | INDEX RANGE SCAN | GENERALWORKSGENCATGIDX | 3860 | 12 (0)| 00:00:01 |

| 6 | BITMAP CONVERSION FROM ROWIDS| | | | | |

|* 7 | INDEX RANGE SCAN | GENERALWORKSGENNMEIDX | 3860 || 16 (0)| 00:00:01 |

-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("GENNO">10000)

5 - access("GENCATG"='ab')

7 - access("GENNAME"='aaaabbbb')

Statistics

----------------------------------------------------------

161 recursive calls

0 db block gets

555 consistent gets

0 physical reads

0 redo size

90912 bytes sent via SQL*Net to client

2636 bytes received via SQL*Net from client

207 SQL*Net roundtrips to/from client

6 sorts (memory)

0 sorts (disk)

3077 rows processed

Here again, the BITMAP conversion is used with the number of rows not tallying with the actual processed.

Scenario 5:

SQL> select * from generalworks where gencatg='ab';

3847 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 418290240

------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 3906 | 102K| 239 (1)| 00:00:03 |

| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 3906 | 102K| 239 (1)| 00:00:03 |

|* 2 | INDEX RANGE SCAN | GENERALWORKSGENCATGIDX | 3906 | | 8 (0)| 00:00:01 |

------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("GENCATG"='ab')

Statistics

----------------------------------------------------------

203 recursive calls

0 db block gets

773 consistent gets

12 physical reads

0 redo size

160080 bytes sent via SQL*Net to client

3197 bytes received via SQL*Net from client

258 SQL*Net roundtrips to/from client

6 sorts (memory)

0 sorts (disk)

3847 rows processed

Scenario 6:

SQL> select * from generalworks where genno > 40000;

10000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 2559079067

--------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 9999 | 263K| 77 (2)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID | GENERALWORKS | 9999 | 263K| 77 (2)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | GENERALWORKSGENIDX | 9999 | | 23 (0)| 00:00:01 |

--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("GENNO">40000)

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

1406 consistent gets

24 physical reads

0 redo size

417145 bytes sent via SQL*Net to client

7707 bytes received via SQL*Net from client

668 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

10000 rows processed

Note:

If we replace the condition “genno > 40000” with “genno > 10000” we will incur FULL TABLE SCAN as the number of records to be selected is 40,000.

Case 2: Composite Indexes

Instead of creating individual indexes for the 3 columns, let us drop the indexes and create a composite index (as below) and study its execution plan.

create index genworkscompindx on generalworks(genno,genname,gencatg);

In this, first position is GENNO, 2nd position is GENNAME and 3rd is GENCATG. Reason for creating this composite index is to analyze and understand the difference in creating indexes in these two different forms. Now, we shall analyze all the scenarios that we have tested and study the implications

Scenario 1

SQL> select * from generalworks where genname='aaaabbbb' and gencatg='ab';

3847 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 857713684

----------------------------------------------------------------------------------

----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 298 | 8046 | 280 (2)| 00:00:04 |

|* 1 | TABLE ACCESS FULL| GENERALWORKS | 298 | 8046 | 280 (2)| 00:00:04 |

----------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("GENNAME"='aaaabbbb' AND "GENCATG"='ab')

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

1514 consistent gets

0 physical reads

0 redo size

112736 bytes sent via SQL*Net to client

3197 bytes received via SQL*Net from client

258 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

3847 rows processed

Please note that the parameter “consistent gets” is high and this query is undergoing full table scan. Also, the select statement took two seconds more than the first statement. We now need to identify the reason for the table scan (even though table scan is not evil). On the surface level, we might say that the columns GENNAME and GENCATG are in positions 2 and 3 and hence the index is not been used properly. If we execute the following statement

SELECT INDEX_NAME, COLUMN_NAME, NUM_ROWS, BLEVEL, LEAF_BLOCKS, CLUSTERING_FACTOR FROM USER_INDEXES UI, USER_IND_COLUMNS UIC WHERE UI.INDEX_NAME=UIC.INDEX_NAME AND UI.TABLE_NAME=’GENERALWORKS’;

We observe that the CLUSTERING_FACTOR to be 264 and LEAF_BLOCKS to be 194, which actually shows that the indexes will be used as the CLUSTERING_FACTOR is almost close to LEAF_BLOCKS. However, the index will be used only as a whole (i.e.) GENNO, GENNAME and GENCATG (any combination of this). If we carefully study the rest of the scenarios, most will lead to table scan. Indexes will be used when we use all the columns that are indexed in the where clause.

Scenario 2:

SQL> select * from generalworks where genname='aaaabbbb';

3847 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 857713684

----------------------------------------------------------------------------------

----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 3860 | 101K| 279 (2)| 00:00:04 |

|* 1 | TABLE ACCESS FULL| GENERALWORKS | 3860 | 101K| 279 (2)| 00:00:04 |

----------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("GENNAME"='aaaabbbb')

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

1514 consistent gets

0 physical reads

0 redo size

112736 bytes sent via SQL*Net to client

3197 bytes received via SQL*Net from client

258 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

3847 rows processed

Please note that the value for consistent gets is high (almost doubled). Also we can observe that the query is undergoing table scan.

Scenario 3:

SQL> select * from generalworks where genno > 10000 and genname='aaaabbbb';

3077 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1452202321

----------------------------------------------------------

------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 3125 | 84375 | 176 (2)| 00:00:03 |

| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 3125 | 84375 | 176 (2)| 00:00:03 |

|* 2 | INDEX RANGE SCAN | GENWORKSCOMPINDX | 3125 | | 159 (2)| 00:00:02 |

------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("GENNO">10000 AND "GENNAME"='aaaabbbb' AND "GENNO" IS NOT NULL)

filter("GENNAME"='aaaabbbb')

Statistics

----------------------------------------------------------

386 recursive calls

0 db block gets

806 consistent gets

376 physical reads

0 redo size

128826 bytes sent via SQL*Net to client

2636 bytes received via SQL*Net from client

207 SQL*Net roundtrips to/from client

8 sorts (memory)

0 sorts (disk)

3077 rows processed

There is something quite interesting happening in this query. Cost has come down from 245 to 176 for access by ROWID. However, the cost of the index range scan has increased drastically from 16 to 159. Here, the cost specified is for the whole index GENWORKSCOMPINDX, which comprises 3 columns and not a single column. Hence the cost is based on the whole index and not for an individual column. Now, if we drop this index and create an index for individual columns, the cost will be less as it will be based on a single column.

Scenario 4:

SQL> select * from generalworks where genno > 10000 and genname='aaaabbbb' and gencatg='ab';

3077 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1452202321

------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 244 | 6588 | 161 (2)| 00:00:02 |

| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 244 | 6588 | 161 (2)| 00:00:02 |

|* 2 | INDEX RANGE SCAN | GENWORKSCOMPINDX | 244 | | 159 (2)| 00:00:02 |

------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("GENNO">10000 AND "GENNAME"='aaaabbbb' AND "GENCATG"='ab' AND "GEN

NO" IS NOT NULL) filter("GENNAME"='aaaabbbb' AND "GENCATG"='ab')

Statistics

----------------------------------------------------------

141 recursive calls

0 db block gets

769 consistent gets

0 physical reads

0 redo size

128826 bytes sent via SQL*Net to client

2636 bytes received via SQL*Net from client

207 SQL*Net roundtrips to/from client

6 sorts (memory)

0 sorts (disk)

3077 rows processed

Here again, we could see an increase in the cost for the range scan.

Scenario 5:

SQL> select * from generalworks where gencatg='ab';

3847 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 857713684

----------------------------------------------------------------------------------

----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 3906 | 102K| 280 (2)| 00:00:04 |

|* 1 | TABLE ACCESS FULL| GENERALWORKS | 3906 | 102K| 280 (2)| 00:00:04 |

----------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("GENCATG"='ab')

Statistics

----------------------------------------------------------

386 recursive calls

0 db block gets

1577 consistent gets

1257 physical reads

0 redo size

112736 bytes sent via SQL*Net to client

3197 bytes received via SQL*Net from client

258 SQL*Net roundtrips to/from client

8 sorts (memory)

0 sorts (disk)

3847 rows processed

Here again, there is a huge increase in “Consistent Gets” and “Physical Reads” and the query is undergoing a table scan. This is because we have used a column in the where clause which is in the third position of the index and does not have statistics of its own. Hence, when this query is executed, the statistics by optimizer for this query will be of no use and is not used to arrive at the proper execution plan.

Scenario 6:

SQL> select * from generalworks where genno > 40000;

10000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1452202321

------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 9999 | 263K| 94 (2)| 00:00:02 |

| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 9999 | 263K| 94 (2)| 00:00:02 |

|* 2 | INDEX RANGE SCAN | GENWORKSCOMPINDX | 9999 | | 40 (0)| 00:00:01 |

------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("GENNO">40000 AND "GENNO" IS NOT NULL)

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

1421 consistent gets

41 physical reads

0 redo size

417145 bytes sent via SQL*Net to client

7707 bytes received via SQL*Net from client

668 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

10000 rows processed

Since the first column in the index is used in the where clause, the optimizer is able to arrive at the proper execution plan using the column. Hence index path is chosen to execute this query.

Scenario 7:

SQL> select * from generalworks where gencatg='ab' and genname='aaaabbbb' and genno > 10000;

3077 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1452202321

------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 244 | 6588 | 161 (2)| 00:00:02 |

| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 244 | 6588 | 161 (2)| 00:00:02 |

|* 2 | INDEX RANGE SCAN | GENWORKSCOMPINDX | 244 | | 159 (2)| 00:00:02 |

------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("GENNO">10000 AND "GENNAME"='aaaabbbb' AND "GENCATG"='ab' AND "GENNO" IS NOT NULL)

filter("GENCATG"='ab' AND "GENNAME"='aaaabbbb')

Statistics

----------------------------------------------------------

183 recursive calls

0 db block gets

773 consistent gets

0 physical reads

0 redo size

128826 bytes sent via SQL*Net to client

2636 bytes received via SQL*Net from client

207 SQL*Net roundtrips to/from client

6 sorts (memory)

0 sorts (disk)

3077 rows processed

The above scenario is provided to illustrate an important point. If an index is create on three columns viz Column a, Column b and Column c, then these three columns can be used in any order (i.e.)

Where Column a = ‘’ or Column b=’’ and Column c=’’

Where Column b = ‘’ or Column c=’’ and Column a=’’

Where Column c = ‘’ or Column a=’’ and Column b=’’

Where Column a = ‘’ or Column c=’’ and Column b=’’

Where Column b = ‘’ or Column a=’’ and Column c=’’

Where Column c = ‘’ or Column b=’’ and Column a=’’

All the above combination will yield the index usage and should not create any major difference in the level of execution.

Conclusion:

If the columns are not always used together, please do create indexes on separate columns. If the columns are always used together, please do create composite indexes. Please also note that creation of index depends on the use case.

Myth 2: Full Table Scan is evil

Many think that full table scan is evil and they try to create indexes or use hints to convert this full table scan into other types of scan where indexes can be used. However, it is not so. Full table scan is never an evil way of executing the query. The reason for the optimizer to decide on the full table scan is probably due to low execution cost. From Oracle documentation, the following on Cost Based Optimizer (CBO) is known:

The CBO determines which execution plan is most efficient by considering available access paths and by factoring in information based on statistics for the schema objects (tables or indexes) accessed by the SQL statement. The CBO performs the following steps:

1. The optimizer generates a set of potential plans for the SQL statement based on available access paths and hints.

2. The optimizer estimates the cost of each plan based on statistics in the data dictionary for the data distribution and storage characteristics of the tables, indexes, and partitions accessed by the statement.

3. The optimizer compares the costs of the plans and chooses the one with the lowest cost.

Let us take the example using the tables created in Myth 1 and see what happens if we induce the index. The following statement is without using an index hint and hence it goes for the full table scan.

SQL> select * from generalworks where genno > 10000;

40000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 857713684

----------------------------------------------------------------------------------

----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 40000 | 1054K| 280 (2)| 00:00:04 |

|* 1 | TABLE ACCESS FULL| GENERALWORKS | 40000 | 1054K| 280 (2)| 00:00:04 |

----------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("GENNO">10000)

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

3913 consistent gets

0 physical reads

0 redo size

1506899 bytes sent via SQL*Net to client

29707 bytes received via SQL*Net from client

2668 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

40000 rows processed

Now, we shall use the index hint so that the optimizer will use this index to execute the query.

SQL> select /*+ index(generalworks genworkscompindx) */ * from generalworks where genno > 10000;

40000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1452202321

------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 40000 | 1054K| 372 (1)| 00:00:05 |

| 1 | TABLE ACCESS BY INDEX ROWID| GENERALWORKS | 40000 | 1054K| 372 (1)| 00:00:05 |

|* 2 | INDEX RANGE SCAN | GENWORKSCOMPINDX | 40000 | | 159 (2)| 00:00:02 |

------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("GENNO">10000 AND "GENNO" IS NOT NULL)

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

5685 consistent gets

117 physical reads

0 redo size

1666842 bytes sent via SQL*Net to client

29707 bytes received via SQL*Net from client

2668 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

40000 rows processed

Now, we have successfully made this query to execute by using the index. However, the following is to be noted:

a) Cost is increased

b) Consistent gets is increased

c) Physical Reads increased

Reason for this is, the optimizer when it compared the cost of executing the queries for various options, it found that using index for this is a costly affair. Hence it chose the full table scan. But when we induced the index usage, the execution plan is observed to be costly.

Having said this, the following are also to be noted:

a) If the cost is less after inducing the index, it means that proper statistics is not fed into the optimizer. Please make sure that all necessary statistics are updated in the tables.

b) Table scan in some occasions may prove to be an evil, if we do not perform the normal ground work like creating appropriate indexes, statistics collection and proper joins.

Myth 3: Rebuilding Index leads to faster retrieval of records.

Rebuilding an index would to some extent decrease the execution time of a select query, but increases the execution time of the DML operation. Please note that I said some extent and not to a greater extent. When we rebuild an index, it tightly compacts the index in such a way that most of the indexes almost lie near to each other there on the query operations completes a bit faster. However, this will also affect the involved DML operations. Let us illustrate this with an example.

Let me create a table, insert some data into it and study the corresponding statistics. After that we will rebuild the involved indexes and study the corresponding statistics and compare the same.

SQL> create table rebuildcheck (firstcol number);

Table created.

SQL> create index rebuildindex on rebuildcheck(firstcol);

Index created.

I created a small procedure to insert the data into this table as follows:

create or replace procedure rebuildproc as

begin

for i in 1 .. 400000

loop

insert into rebuildcheck values (trunc(dbms_random.value(1,500000)));

end loop;

commit;

end;

In order to study the behaviour of this insert operation, I turned on the SQL trace as follows:

SQL> ALTER SYSTEM SET TIMED_STATISTICS = TRUE;

System altered.

SQL> alter session set sql_trace=true;

Session altered.

SQL> exec rebuildproc

PL/SQL procedure successfully completed.

SQL> alter session set sql_trace=false;

Session altered.

Now in my command prompt, I converted the generated trace file into a text file using TKPROF utility as follows:

tkprof D:\oracle\product\10.2.0\admin\orcl\udump\

orcl_ora_3820.trc d:\oracleexperiments\rebuildtrace.txt explain=hari/hari

The result shows the execution details of this procedure as follows:

BEGIN rebuildproc; END;

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.01 0.00 0 0 0 0

Execute 1 48.17 51.25 0 0 0 1

Fetch 0 0.00 0.00 0 0 0 0

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 2 48.18 51.25 0 0 0 1

CPU is 48 seconds with the elapsed time of 51 seconds

Now, let us also select from this table and see the execution plan

SQL> select * from rebuildcheck;

400000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3132651637

----------------------------------------------------------------------------------

----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 378K| 4803K| 149 (8)| 00:00:02 |

| 1 | TABLE ACCESS FULL| REBUILDCHECK | 378K| 4803K| 149 (8)| 00:00:02 |

----------------------------------------------------------------------------------

Note

-----

- dynamic sampling used for this statement

Statistics

----------------------------------------------------------

4 recursive calls

0 db block gets

27359 consistent gets

0 physical reads

0 redo size

7454987 bytes sent via SQL*Net to client

293707 bytes received via SQL*Net from client

26668 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

400000 rows processed

Please note that even if we induce index the cost remains more or less the same as shown below:

SQL> select /*+ index(rebuildcheck rebuildindex) */ * from rebuildcheck;

400000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3132651637

----------------------------------------------------------------------------------

----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 345K| 4384K| 148 (7)| 00:00:02 |

| 1 | TABLE ACCESS FULL| REBUILDCHECK | 345K| 4384K| 148 (7)| 00:00:02 |

----------------------------------------------------------------------------------

Note

-----

- dynamic sampling used for this statement

Statistics

----------------------------------------------------------

205 recursive calls

1 db block gets

27392 consistent gets

0 physical reads

176 redo size

5854855 bytes sent via SQL*Net to client

293707 bytes received via SQL*Net from client

26668 SQL*Net roundtrips to/from client

5 sorts (memory)

0 sorts (disk)

400000 rows processed

Now, let us rebuild the index and see what happens:

SQL> alter index rebuildindex rebuild;

Index altered.

SQL> EXEC DBMS_STATS.gather_table_stats('HARI', 'REBUILDCHECK');

PL/SQL procedure successfully completed.

SQL> EXEC DBMS_STATS.gather_index_stats('HARI', 'REBUILDINDEX');

PL/SQL procedure successfully completed.

SQL> select * from rebuildcheck;

400000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3132651637

----------------------------------------------------------------------------------

----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 378K| 4803K| 149 (8)| 00:00:02 |

| 1 | TABLE ACCESS FULL| REBUILDCHECK | 378K| 4803K| 149 (8)| 00:00:02 |

----------------------------------------------------------------------------------

Note

-----

- dynamic sampling used for this statement

Statistics

----------------------------------------------------------

50 recursive calls

0 db block gets

27365 consistent gets

0 physical reads

0 redo size

7454987 bytes sent via SQL*Net to client

293707 bytes received via SQL*Net from client

26668 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

400000 rows processed

SQL> select /*+ index(rebuildcheck rebuildindex) */ * from rebuildcheck;

400000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3132651637

----------------------------------------------------------------------------------

----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 400K| 1562K| 149 (8)| 00:00:02 |

| 1 | TABLE ACCESS FULL| REBUILDCHECK | 400K| 1562K| 149 (8)| 00:00:02 |

----------------------------------------------------------------------------------

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

27293 consistent gets

0 physical reads

0 redo size

5854855 bytes sent via SQL*Net to client

293707 bytes received via SQL*Net from client

26668 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

400000 rows processed

BEGIN rebuildproc; END;

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 2 0.03 0.00 0 0 0 0

Execute 2 101.79 108.58 0 0 0 2

Fetch 0 0.00 0.00 0 0 0 0

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 101.82 108.59 0 0 0 2

Misses in library cache during parse: 2

Optimizer mode: ALL_ROWS

Parsing user id: 62 (HARI)

Please note that CPU is increased 102 seconds with the elapsed time of 109 seconds. Almost 54 seconds more than expected with 58 seconds more on elapsed time. Also, note that not much improvement is noticed on the query execution.

The argument on using the rebuild index option still continues in this database world. One side of experts advise not to use rebuild option at all while the other end of experts advise to use the same. If we use rebuild, the issue is on using the DML statements.

Database view is not an evil database object

Author:Hariharan T Sairam

Many folks in IT industry think that using database view is an evil, degrade performance on usage and should never be used. This article provides substantial evidence that view never impacts performance.

By definition, a database view is a virtual query. It never exists as an object. If a view is slow in performance, it is not due to view and it is due to the underlying query. Many fail to understand this basic concept. In order to understand this, I have created the following tables with index on EMPNO and each table has 100,000 records.

1. EMPLOYEE

2. EMPSALARY

3. EMPADDRESS

To begin with, I have created a view called "EMPVIEW" joining the tables EMPLOYEE and EMPSALARY and its definition is given below:

CREATE VIEW EMPVIEW AS SELECT EMP.EMPNO, EMP.EMPNAME, EMPS.EMPMONTH, EMPS.EMPBASIC, EMPS.EMPHRA, EMPS.EMPCONV FROM EMPLOYEE EMP JOIN EMPSALARY EMPS ON EMP.EMPNO=EMPS.EMPNO;

SQL> SELECT * FROM EMPVIEW;

100000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 820308138

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 83 | 5 (20)| 00:00:01 |

|* 1 | HASH JOIN | | 1 | 83 | 5 (20)| 00:00:01 |

| 2 | TABLE ACCESS FULL| EMPLOYEE | 1 | 25 | 2 (0)| 00:00:01 |

| 3 | TABLE ACCESS FULL| EMPSALARY | 1 | 58 | 2 (0)| 00:00:01 |

--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access ("EMP"."EMPNO"="EMPS"."EMPNO")

Statistics

----------------------------------------------------------

612 recursive calls

0 db block gets

7436 consistent gets

0 physical reads

0 redo size

3518422 bytes sent via SQL*Net to client

73707 bytes received via SQL*Net from client

6668 SQL*Net roundtrips to/from client

16 sorts (memory)

0 sorts (disk)

100000 rows processed

Now, if we execute the underlying query of the view (which is given below), obviously we will come to know that the explain plan is same except for a very small difference in the statistics.

SELECT EMP.EMPNO, EMP.EMPNAME, EMPS.EMPMONTH, EMPS.EMPBASIC, EMPS.EMPHRA, EMPS.EMPCONV FROM EMPLOYEE EMP JOIN EMPSALARY EMPS ON EMP.EMPNO=EMPS.EMPNO;

100000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 820308138

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 83 | 5 (20)| 00:00:01 |

|* 1 | HASH JOIN | | 1 | 83 | 5 (20)| 00:00:01 |

| 2 | TABLE ACCESS FULL| EMPLOYEE | 1 | 25 | 2 (0)| 00:00:01 |

| 3 | TABLE ACCESS FULL| EMPSALARY | 1 | 58 | 2 (0)| 00:00:01 |

--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("EMP"."EMPNO"="EMPS"."EMPNO")

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

7343 consistent gets

0 physical reads

0 redo size

3518422 bytes sent via SQL*Net to client

73707 bytes received via SQL*Net from client

6668 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

100000 rows processed

Now, I am joining all the three tables and the explain plan for the same is given below:

SELECT EMP.EMPNO, EMP.EMPNAME, EMPA.EMPADDR1, EMPA.EMPADDR2, EMPA.EMPADDR3, EMPS.EMPMONTH, EMPS.EMPBASIC, EMPS.EMPHRA, EMPS.EMPCONV FROM EMPLOYEE EMP JOIN EMPADDRESS EMPA ON EMP.EMPNO=EMPA.EMPNO JOIN EMPSALARY EMPS ON EMP.EMPNO=EMPS.EMPNO

100000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3201299630

------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 100K| 4492K| | 838 (4)|00:00:11 |

|* 1 | HASH JOIN | | 100K| 4492K| 3328K| 838 (4)|00:00:11 |

|* 2 | HASH JOIN | | 100K| 2148K| 1960K| 398 (4)|00:00:05 |

| 3 | TABLE ACCESS FULL| EMPLOYEE | 100K| 781K| | 72 (5)|00:00:01 |

| 4 | TABLE ACCESS FULL| EMPADDRESS | 100K| 1367K| | 100 (4)|00:00:02 |

| 5 | TABLE ACCESS FULL | EMPSALARY | 100K| 2343K| | 100 (4)|00:00:02 |

------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("EMP"."EMPNO"="EMPS"."EMPNO")

2 - access("EMP"."EMPNO"="EMPA"."EMPNO")

Statistics

----------------------------------------------------------

942 recursive calls

0 db block gets

7922 consistent gets

1130 physical reads

0 redo size

3618604 bytes sent via SQL*Net to client

73707 bytes received via SQL*Net from client

6668 SQL*Net roundtrips to/from client

18 sorts (memory)

0 sorts (disk)

100000 rows processed

Now let us join the view with the table “EMPADDRESS” and see the results:

SELECT EMPV.EMPNO, EMPV.EMPNAME, EMPA.EMPADDR1, EMPA.EMPADDR2, EMPA.EMPADDR3, EMPV.EMPMONTH, EMPV.EMPBASIC, EMPV.EMPHRA, EMPV.EMPCONV FROM EMPVIEW EMPV JOIN EMPADDRESS EMPA ON EMPV.EMPNO=EMPA.EMPNO

100000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3201299630

------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 100K| 4492K| | 838 (4)|00:00:11 |

|* 1 | HASH JOIN | | 100K| 4492K| 3328K| 838 (4)|00:00:11 |

|* 2 | HASH JOIN | | 100K| 2148K| 1960K| 398 (4)|00:00:05 |

| 3 | TABLE ACCESS FULL| EMPLOYEE | 100K| 781K| | 72 (5)|00:00:01 |

| 4 | TABLE ACCESS FULL| EMPADDRESS | 100K| 1367K| | 100 (4)|00:00:02 |

| 5 | TABLE ACCESS FULL | EMPSALARY | 100K| 2343K| | 100 (4)|00:00:02 |

------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("EMP"."EMPNO"="EMPS"."EMPNO")

2 - access("EMP"."EMPNO"="EMPA"."EMPNO")

Statistics

----------------------------------------------------------

588 recursive calls

0 db block gets

7878 consistent gets

3 physical reads

0 redo size

3618604 bytes sent via SQL*Net to client

73707 bytes received via SQL*Net from client

6668 SQL*Net roundtrips to/from client

20 sorts (memory)

0 sorts (disk)

100000 rows processed

This proves a point that the execution plan of a query with view is same as using the actual query. However, let us see what happens if we create a view on another view. For this let us create another table called "EMPWORKS" having 100,000 records

Let us create a view called “EMPWORKVIEW” by joining the already created EMPVIEW with EMPWORKS table.

CREATE VIEW EMPWORKVIEW AS SELECT EMPV.EMPNO, EMPV.EMPNAME, EMPW.EMPWRKAREA, EMPW.EMPWRKLOC, EMPW.EMPROLE, EMPV.EMPMONTH, EMPV.EMPBASIC, EMPV.EMPHRA, EMPV.EMPCONV FROM EMPVIEW EMPV JOIN EMPWORKS EMPW ON EMPV.EMPNO=EMPW.EMPNO

SELECT * FROM EMPWORKVIEW

100000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 2018455878

-----------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 84747 | 5462K| | 918 (3)| 00:00:12 |

|* 1 | HASH JOIN | | 84747 | 5462K| 3808K| 918 (3)| 00:00:12 |

| 2 | TABLE ACCESS FULL | EMPWORKS | 84747 | 2813K| | 72 (5)| 00:00:01 |

|* 3 | HASH JOIN | | 100K| 3125K| 1960K| 446 (4)| 00:00:06 |

| 4 | TABLE ACCESS FULL| EMPLOYEE | 100K| 781K| | 72 (5)| 00:00:01 |

| 5 | TABLE ACCESS FULL| EMPSALARY | 100K| 2343K| | 100 (4)| 00:00:02 |

-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access ("EMP"."EMPNO"="EMPW"."EMPNO")

3 - access ("EMP"."EMPNO"="EMPS"."EMPNO")

Note

-----

- dynamic sampling used for this statement

Statistics

----------------------------------------------------------

252 recursive calls

1 db block gets

7751 consistent gets

0 physical reads

176 redo size

4502964 bytes sent via SQL*Net to client

73707 bytes received via SQL*Net from client

6668 SQL*Net roundtrips to/from client

7 sorts (memory)

0 sorts (disk)

100000 rows processed

Now, if we join the three tables EMPWORKS, EMPSALARY and EMPWORKS we can see that the execution plan remains the same, but with meager difference in the statistics.

SELECT EMP.EMPNO, EMP.EMPNAME, EMPS.EMPMONTH, EMPS.EMPBASIC, EMPS.EMPHRA, EMPS.EMPCONV, EMPW.EMPWRKAREA, EMPW.EMPWRKLOC, EMPW.EMPROLE FROM EMPLOYEE EMP JOIN EMPSALARY EMPS ON EMP.EMPNO=EMPS.EMPNO JOIN EMPWORKS EMPW ON EMP.EMPNO=EMPW.EMPNO;