Next generation strategy and solutions: SQL Server Indexes

Now let us start loading the data using the following script.

BEGIN

DECLARE @COUNTER AS INT

SET @COUNTER=300000

WHILE (@COUNTER <= 500000)

begin

INSERT INTO ACCOUNTS VALUES (@COUNTER, ('ACCOUNT ' + CAST(@COUNTER AS VARCHAR)), ('ACCOUNT DESC ' + CAST(@COUNTER AS VARCHAR)), (CASE (CAST(@COUNTER AS VARCHAR) % 4) WHEN 1 THEN 'A' WHEN 2 THEN 'L' WHEN 3 THEN 'E' WHEN 4 THEN 'I' ELSE NULL END))

set @COUNTER=@COUNTER+1

end

END

It took 1 minute to execute the above script and Page Splits per second increased from 18,562 to 19,604. It took just 1042 page splits per second to load 150,000 records. Fragment details are provided below:

Now let us start working on the second set of data and observe the page splits and fragmentation.

BEGIN

DECLARE @COUNTER AS INT

SET @COUNTER=300000

WHILE (@COUNTER <= 500000)

begin

set @COUNTER=@COUNTER+1

end

END

It took 1 minute and 39 seconds to execute the above script and Page Splits per second increased from 19,604 to 21,030. It took just 1426 page splits per second to load 200,000 records. Fragment details are provided below:

The most interesting part is the third set, which actually inserts the data in between the existing data. This will increase the page splits further and cause the cascading effects on the remaining data as well. This should not cause any problem if the clustering key is defined properly. Let us execute the following script to insert the missing data.

BEGIN

DECLARE @COUNTER AS INT

SET @COUNTER=150001

WHILE (@COUNTER <= 299999)

begin

set @COUNTER=@COUNTER+1

end

END

It took 1 minute to execute the above script and Page Splits per second increased from 21,030 to 22,099. It took just 1069 page splits per second to load 150,000 records. Fragment details are provided below:

Scenario 2: Not So Well Defined Clustered Key

Now let us change the scenario and use a calculated column and index the same. By calculated column, I mean a column that is derived from another column or a combination of values that can be put in one column. For this, I will create another table called PurchaseMaster as follows:

CREATE TABLE [dbo].[purchasemaster](

[purchasenumber] [varchar](20) NULL,

[purchasedescription] [varchar](30) NULL,

[vendorcode] [varchar](15) NULL,

[dateofpurchase] [datetime] NULL

)

A clustered index was also created on the column purchasenumber as follows:

CREATE CLUSTERED INDEX PURCHASEMASTERIDX ON PURCHASEMASTER(PURCHASENUMBER)

The column PURCHASENUMBER is derived on the basis of date of purchase. For example, the following is the sample data available in the PURCHASENUMBER column.

purchasenumber

1\2011-2012

10\2011-2012

100\2011-2012

1000\2011-2012

101\2011-2012

102\2011-2012

103\2011-2012

104\2011-2012

105\2011-2012

106\2011-2012

Please note that each purchase number is appended with the financial year. The purchase number 100 has the financial year 2011-2012 and the same for the other purchase numbers. As soon as the person who is raising the purchase order looks at this, the person will know the financial year details. This is indeed very useful and infact very handy as well. Also, when we issue the following statement, we did not see any anomaly.

SELECT * FROM PURCHASEMASTER WHERE PURCHASENUMBER='1000\2011-2012'

The execution plan for the above select statement is provided below:

Now, if we see a range filter based query as follows, the results were not proper.

SELECT * FROM PURCHASEMASTER WHERE PURCHASENUMBER>'1\2011-2012' AND PURCHASENUMBER <'101\2011-2012'

purchasenumber purchasedescription vendorcode dateofpurchase

10\2011-2012 Purchase 10 Vendor 10 2011-05-30 19:08:48.653

100\2011-2012 Purchase 100 Vendor 100 2011-05-30 19:08:48.687

1000\2011-2012 Purchase 1000 Vendor 1000 2011-05-30 19:08:49.217

Now there are no records in the table and entire data was truncated. We will now study the page splits and the fragmentation details for this clustered index. Initially the page splits is calculated using the following query and was observed to be 9325.

SELECT * FROM SYS.SYSPERFINFO WHERE COUNTER_NAME = 'PAGE SPLITS/SEC' AND OBJECT_NAME LIKE '%ACCESS METHODS%'

Also, the fragmentation details for this table and index is illustrated below:

Now we will insert two sets of data, purchase number from 1 to 150,000 and from 300,000 to 500,000 with a gap of 150,000 between the sets of data. After inserting every set of data, we will observe the corresponding page splits and fragment.

The following script will be used to generate these sets of data.

BEGIN

DECLARE @COUNTER AS INT

SET @COUNTER=1

WHILE (@COUNTER <= 150000)

BEGIN

INSERT INTO PURCHASEMASTER VALUES

(CAST(@COUNTER AS VARCHAR) + '\' + cast(YEAR(GETDATE()) as varchar) + '-' + cast(YEAR(GETDATE())+1 as varchar),

'PURCHASE ' + CAST(@COUNTER AS VARCHAR),

'VENDOR ' + CAST(@COUNTER AS VARCHAR),

GETDATE())

SET @COUNTER=@COUNTER+1

END

Inserting 150,000 records took just 1 minute and 3 seconds. The page splits/second and fragmentation after this are provided below:

From 9325, page splits/second has increased to 11523 which caused 2198 page splits per second during this insert operation.

Also the fragment details are provided below:

Please note that the fragmentaion is observed to be high in this case. Now, let us consider the next set, which is inserting next set of 200,000 records from 300,000 to 500,000.

BEGIN

DECLARE @COUNTER AS INT

SET @COUNTER=300000

WHILE (@COUNTER <= 500000)

BEGIN

INSERT INTO PURCHASEMASTER VALUES

(CAST(@COUNTER AS VARCHAR) + '\' + cast(YEAR(GETDATE()) as varchar) + '-' + cast(YEAR(GETDATE())+1 as varchar),

'PURCHASE ' + CAST(@COUNTER AS VARCHAR),

'VENDOR ' + CAST(@COUNTER AS VARCHAR),

GETDATE())

SET @COUNTER=@COUNTER+1

END

Inserting 200,000 records took just 1 minute and 18 seconds. The page splits/second and fragmentation after this are provided below:

From 11,523, page splits/second has increased to 14,947 which caused 3,424 page splits per second during this insert operation. Also the fragment details are provided below:

Wow, that is a great increase in fragmentation.

Now the final is what more important to observe both Page Splits and Fragmentation. We are now going to insert the missing records between 150,000 and 300,000 and observe the page splits. I will also explain diagramatically how this would happen. Now let us execute the same script by changing the values.

BEGIN

DECLARE @COUNTER AS INT

SET @COUNTER=150001

WHILE (@COUNTER <= 299999)

BEGIN

INSERT INTO PURCHASEMASTER VALUES

(CAST(@COUNTER AS VARCHAR) + '\' + cast(YEAR(GETDATE()) as varchar) + '-' + cast(YEAR(GETDATE())+1 as varchar),

'PURCHASE ' + CAST(@COUNTER AS VARCHAR),

'VENDOR ' + CAST(@COUNTER AS VARCHAR),

GETDATE())

SET @COUNTER=@COUNTER+1

END

The third batch took just 57 seconds to complete. Page Splits per second was observed to be 17,520 which caused 2,573 page splits per second during this insert operation. Also the fragment details are provided below:

Now having said this on page splits and fragmentation, how do we know if this is good or bad? Let us compare the scenario 1 and 2 in the following table and observe the results.

Description		Execution Time (mm:ss)	Page Splits	Fragmentation (AVG_FRAGMENTATION_ IN_PERCENT FOR INDEX_LEVEL=0)
Scenario 1	1-150,000	01:00	1,042	0.68
	300,000-500,000	01:39	1,426	0.57
	150,001-299,999	01:00	1,069	0.62
Scenario 2	1-150,000	01:03	2,198	35.21
	300,000-500,000	01:18	3,424	41.87
	150,001-299,999	00:57	2,573	43.36

This shows in Scenario 2, as the cluster key is not well defined, the inserts to fill the missing gap (150,001 to 299,999) is observed to be high. Also, the fragmentation and Page Splits are quite high since the cluster key is not well defined. Whenever we need to define a clustered index on a key, please make sure the following:

a) Key should be well defined, satisfying the business requirements and as well as the providing a greater selectivity.

b) Key should not be defined on derived table and calculated column

c) Other columns in the table should be completely dependent on the cluster key

Next generation strategy and solutions

Thursday, June 2, 2011

SQL Server Indexes – An Insider Look - Part 2

Scenario 2: Not So Well Defined Clustered Key

No comments: