Non-Clustered Index
What we saw till now is a very ideal situtation where we have a well defined key and can go ahead to create a clustered index. But what if the data is not so organized but we still need to have an index on the column. Enter the world of Non-Clustered Index. Where actually the data is not organized or ordered in any pre-existing condition, but still satisfy the purposes. The major difference is leaf level does not contain data. It includes pointers to the relevant data and the pointer is a simple RID column. RID columns include the information on extent, page and row offset. Using this RID column it will go and fetch the data from different pages.
Another issue with with non-clustered index is it behaves differently for heap and clustered table. Here again we will see two scenarios. One Non-clustered index created on a heap and then created on a clustered table.
Scenario 3: Non-Clustered Index on Heap Table
Let us create a table called PRODUCTS as follows:
CREATE TABLE PRODUCTS (PRODUCTNUMBER NUMERIC(6), PRODUCTNAME VARCHAR(30), PRODUCTDESCRIPTION VARCHAR(50), PRODUCTTYPE VARCHAR(1))
Now let us create a non-clustered index as follows:
CREATE NONCLUSTERED INDEX PRODIDX ON PRODUCTS(PRODUCTNUMBER, PRODUCTNAME)
The following script will insert the data into this table.
BEGIN
DECLARE @COUNTER AS INT
SET @COUNTER=1
WHILE (@COUNTER <= 50000)
BEGIN
INSERT INTO PRODUCTS VALUES
(CAST(@COUNTER AS VARCHAR),
'PRODUCTS ' + CAST(@COUNTER AS VARCHAR),
'DESCRIPTION ' + CAST(@COUNTER AS VARCHAR),
(case (@COUNTER % 2) when 1 then 'I' when 0 then 'R' else NULL end))
SET @COUNTER=@COUNTER+1
END
END
Now let us use a simple select clause and observe the execution plan.
SELECT * FROM PRODUCTS WHERE PRODUCTNUMBER=2089
If you see this execution plan carefully, the following can be observed:
a) Since we do not have clustered index in this table, it has used RID lookup to retrieve other two columns PRODUCTDESCRIPTION and PRODUCTTYPE
b) It has used Index Seek to retrieve indexed columns PRODUCTNUMBER and PRODUCTNAME
c) Internally what happens is the optimizer uses NESTED LOOPS to join the data from the results of the two operations viz Index Seek and RID Lookup.
Is this kind of execution plan is bad? Depends on the business needs and execution time. The above query took couple of seconds to execute. Let us now select only the indexed columns and see what is the execution plan for the same query.
SELECT PRODUCTNUMBER, PRODUCTNAME FROM PRODUCTS WHERE PRODUCTNUMBER=2089
Now the execution plan is changed, no RID lookup because we have selected only the indexed columns. What if we wanted to avoid RID lookup and at the same time most of the columns in the table to be selected? Use Include option along with the definition of the index creation as follows:
DROP INDEX PRODUCTS.PRODIDX
CREATE NONCLUSTERED INDEX PRODIDX ON PRODUCTS(PRODUCTNUMBER, PRODUCTNAME) INCLUDE PRODUCTDESCRIPTION, PRODUCTTYPE
The following select statement provides an execution plan
SELECT PRODUCTNUMBER, PRODUCTNAME FROM PRODUCTS WHERE PRODUCTNUMBER=2089
Please note that we did not get RID lookup, which is because of the columns included in the index creation. However, please be careful with INCLUDE column option. Please do benchmark before using it as it occupies more space in LEAF level and there on reducing the number of rows and increasing the number of IOs.
Scenario 4: Non-Clustered Index on Clustered Table
Now, in the same PRODUCTS table I have dropped the indexes and created two indexes as follows:
DROP INDEX PRODUCTS.PRODIDX
CREATE CLUSTERED INDEX PRODCLUSTIDX ON PRODUCTS(PRODUCTNUMBER)
CREATE NONCLUSTERED INDEX PRODIDX ON PRODUCTS(PRODUCTNAME)
Again we have the same number of records viz 50,000 records in the table. The following select statement produces the below execution plan:
Select * from products where productname=’PRODUCTS 9201’
If you read the execution plan, the following can be observed:
a) Since we have clustered index in this table, it has used Key Lookup to retrieve other two columns PRODUCTDESCRIPTION and PRODUCTTYPE
b) It has used Index Seek to retrieve indexed columns PRODUCTNUMBER and PRODUCTNAME
c) Internally what happens is the optimizer uses NESTED LOOPS to join the data from the results of the two operations viz Index Seek and Key Lookup.
Further please note that in case if the KEY LOOKUP is to be avoided, use INCLUDE option while creating the index.
Non-Clustered index can be used as part of heap table or as part of clustered table. There is no hard and fast rule that it has to be used only with heap or clustered table. Before using the same we need to benchmark and test it completely.
| Further Reading | Please don’t forget to visit this site to read my next articles on Ø Joins and Sets Ø Hierarchical Queries |