Now let us start loading the data using the following script.
BEGIN
DECLARE @COUNTER AS INT
SET @COUNTER=300000
WHILE (@COUNTER <= 500000)
begin
INSERT INTO ACCOUNTS VALUES (@COUNTER, ('ACCOUNT ' + CAST(@COUNTER AS VARCHAR)), ('ACCOUNT DESC ' + CAST(@COUNTER AS VARCHAR)), (CASE (CAST(@COUNTER AS VARCHAR) % 4) WHEN 1 THEN 'A' WHEN 2 THEN 'L' WHEN 3 THEN 'E' WHEN 4 THEN 'I' ELSE NULL END))
set @COUNTER=@COUNTER+1
end
END
It took 1 minute to execute the above script and Page Splits per second increased from 18,562 to 19,604. It took just 1042 page splits per second to load 150,000 records. Fragment details are provided below:
Now let us start working on the second set of data and observe the page splits and fragmentation.
BEGIN
DECLARE @COUNTER AS INT
SET @COUNTER=300000
WHILE (@COUNTER <= 500000)
begin
INSERT INTO ACCOUNTS VALUES (@COUNTER, ('ACCOUNT ' + CAST(@COUNTER AS VARCHAR)), ('ACCOUNT DESC ' + CAST(@COUNTER AS VARCHAR)), (CASE (CAST(@COUNTER AS VARCHAR) % 4) WHEN 1 THEN 'A' WHEN 2 THEN 'L' WHEN 3 THEN 'E' WHEN 4 THEN 'I' ELSE NULL END))
set @COUNTER=@COUNTER+1
end
END
It took 1 minute and 39 seconds to execute the above script and Page Splits per second increased from 19,604 to 21,030. It took just 1426 page splits per second to load 200,000 records. Fragment details are provided below:
The most interesting part is the third set, which actually inserts the data in between the existing data. This will increase the page splits further and cause the cascading effects on the remaining data as well. This should not cause any problem if the clustering key is defined properly. Let us execute the following script to insert the missing data.
BEGIN
DECLARE @COUNTER AS INT
SET @COUNTER=150001
WHILE (@COUNTER <= 299999)
begin
INSERT INTO ACCOUNTS VALUES (@COUNTER, ('ACCOUNT ' + CAST(@COUNTER AS VARCHAR)), ('ACCOUNT DESC ' + CAST(@COUNTER AS VARCHAR)), (CASE (CAST(@COUNTER AS VARCHAR) % 4) WHEN 1 THEN 'A' WHEN 2 THEN 'L' WHEN 3 THEN 'E' WHEN 4 THEN 'I' ELSE NULL END))
set @COUNTER=@COUNTER+1
end
END
It took 1 minute to execute the above script and Page Splits per second increased from 21,030 to 22,099. It took just 1069 page splits per second to load 150,000 records. Fragment details are provided below:
Scenario 2: Not So Well Defined Clustered Key
Now let us change the scenario and use a calculated column and index the same. By calculated column, I mean a column that is derived from another column or a combination of values that can be put in one column. For this, I will create another table called PurchaseMaster as follows:
CREATE TABLE [dbo].[purchasemaster](
[purchasenumber] [varchar](20) NULL,
[purchasedescription] [varchar](30) NULL,
[vendorcode] [varchar](15) NULL,
[dateofpurchase] [datetime] NULL
)
A clustered index was also created on the column purchasenumber as follows:
CREATE CLUSTERED INDEX PURCHASEMASTERIDX ON PURCHASEMASTER(PURCHASENUMBER)
The column PURCHASENUMBER is derived on the basis of date of purchase. For example, the following is the sample data available in the PURCHASENUMBER column.
purchasenumber
1\2011-2012
10\2011-2012
100\2011-2012
1000\2011-2012
101\2011-2012
102\2011-2012
103\2011-2012
104\2011-2012
105\2011-2012
106\2011-2012
Please note that each purchase number is appended with the financial year. The purchase number 100 has the financial year 2011-2012 and the same for the other purchase numbers. As soon as the person who is raising the purchase order looks at this, the person will know the financial year details. This is indeed very useful and infact very handy as well. Also, when we issue the following statement, we did not see any anomaly.
SELECT * FROM PURCHASEMASTER WHERE PURCHASENUMBER='1000\2011-2012'
The execution plan for the above select statement is provided below:
Now, if we see a range filter based query as follows, the results were not proper.
SELECT * FROM PURCHASEMASTER WHERE PURCHASENUMBER>'1\2011-2012' AND PURCHASENUMBER <'101\2011-2012'
purchasenumber purchasedescription vendorcode dateofpurchase
10\2011-2012 Purchase 10 Vendor 10 2011-05-30 19:08:48.653
100\2011-2012 Purchase 100 Vendor 100 2011-05-30 19:08:48.687
1000\2011-2012 Purchase 1000 Vendor 1000 2011-05-30 19:08:49.217
Now there are no records in the table and entire data was truncated. We will now study the page splits and the fragmentation details for this clustered index. Initially the page splits is calculated using the following query and was observed to be 9325.
SELECT * FROM SYS.SYSPERFINFO WHERE COUNTER_NAME = 'PAGE SPLITS/SEC' AND OBJECT_NAME LIKE '%ACCESS METHODS%'
Also, the fragmentation details for this table and index is illustrated below:
Now we will insert two sets of data, purchase number from 1 to 150,000 and from 300,000 to 500,000 with a gap of 150,000 between the sets of data. After inserting every set of data, we will observe the corresponding page splits and fragment.
The following script will be used to generate these sets of data.
BEGIN
DECLARE @COUNTER AS INT
SET @COUNTER=1
WHILE (@COUNTER <= 150000)
BEGIN
INSERT INTO PURCHASEMASTER VALUES
(CAST(@COUNTER AS VARCHAR) + '\' + cast(YEAR(GETDATE()) as varchar) + '-' + cast(YEAR(GETDATE())+1 as varchar),
'PURCHASE ' + CAST(@COUNTER AS VARCHAR),
'VENDOR ' + CAST(@COUNTER AS VARCHAR),
GETDATE())
SET @COUNTER=@COUNTER+1
END
END
Inserting 150,000 records took just 1 minute and 3 seconds. The page splits/second and fragmentation after this are provided below:
From 9325, page splits/second has increased to 11523 which caused 2198 page splits per second during this insert operation.
Also the fragment details are provided below:
Please note that the fragmentaion is observed to be high in this case. Now, let us consider the next set, which is inserting next set of 200,000 records from 300,000 to 500,000.
BEGIN
DECLARE @COUNTER AS INT
SET @COUNTER=300000
WHILE (@COUNTER <= 500000)
BEGIN
INSERT INTO PURCHASEMASTER VALUES
(CAST(@COUNTER AS VARCHAR) + '\' + cast(YEAR(GETDATE()) as varchar) + '-' + cast(YEAR(GETDATE())+1 as varchar),
'PURCHASE ' + CAST(@COUNTER AS VARCHAR),
'VENDOR ' + CAST(@COUNTER AS VARCHAR),
GETDATE())
SET @COUNTER=@COUNTER+1
END
END
Inserting 200,000 records took just 1 minute and 18 seconds. The page splits/second and fragmentation after this are provided below:
From 11,523, page splits/second has increased to 14,947 which caused 3,424 page splits per second during this insert operation. Also the fragment details are provided below:
Wow, that is a great increase in fragmentation.
Now the final is what more important to observe both Page Splits and Fragmentation. We are now going to insert the missing records between 150,000 and 300,000 and observe the page splits. I will also explain diagramatically how this would happen. Now let us execute the same script by changing the values.
BEGIN
DECLARE @COUNTER AS INT
SET @COUNTER=150001
WHILE (@COUNTER <= 299999)
BEGIN
INSERT INTO PURCHASEMASTER VALUES
(CAST(@COUNTER AS VARCHAR) + '\' + cast(YEAR(GETDATE()) as varchar) + '-' + cast(YEAR(GETDATE())+1 as varchar),
'PURCHASE ' + CAST(@COUNTER AS VARCHAR),
'VENDOR ' + CAST(@COUNTER AS VARCHAR),
GETDATE())
SET @COUNTER=@COUNTER+1
END
END
The third batch took just 57 seconds to complete. Page Splits per second was observed to be 17,520 which caused 2,573 page splits per second during this insert operation. Also the fragment details are provided below:
Now having said this on page splits and fragmentation, how do we know if this is good or bad? Let us compare the scenario 1 and 2 in the following table and observe the results.
| Description | Execution Time (mm:ss) | Page Splits | Fragmentation (AVG_FRAGMENTATION_ IN_PERCENT FOR INDEX_LEVEL=0) | |
| Scenario 1 | 1-150,000 | 01:00 | 1,042 | 0.68 |
| 300,000-500,000 | 01:39 | 1,426 | 0.57 | |
| 150,001-299,999 | 01:00 | 1,069 | 0.62 | |
| Scenario 2 | 1-150,000 | 01:03 | 2,198 | 35.21 |
| 300,000-500,000 | 01:18 | 3,424 | 41.87 | |
| 150,001-299,999 | 00:57 | 2,573 | 43.36 | |
This shows in Scenario 2, as the cluster key is not well defined, the inserts to fill the missing gap (150,001 to 299,999) is observed to be high. Also, the fragmentation and Page Splits are quite high since the cluster key is not well defined. Whenever we need to define a clustered index on a key, please make sure the following:
a) Key should be well defined, satisfying the business requirements and as well as the providing a greater selectivity.
b) Key should not be defined on derived table and calculated column
c) Other columns in the table should be completely dependent on the cluster key
No comments:
Post a Comment