Introduction
There was a time where many have not realized the importance of indexes in database world. Indexes were predominantly used to decrease the execution time of a query without understanding the side effects it would cause on DML statements. However, in this contemporary world where lots and lots of information are available in the net, many have realized that indexes are to be used carefully and used only when a need arises. This article even though provides information on the basics, we will also delve into the internals on how these indexes react under the cover.
People who travel to the world of SQL Server from other databases will obviously have to confine themselves with only two types of indexes viz Clustered Indexes and Non-Clustered indexes. The design and architecture of SQL Server allows the user to play with these indexes alone (even though SQL Server 2008 has provided a new feature called FILTERED INDEXES, which we will see in next article). We will work on the different scenarios, observe the results and provide a logical conclusion towards the end. Please note that this is not a universal conclusion and results may vary depending on the data size, underlying OS, hardware configuration, number of concurrent users etc.
I will be using SQL Server 2008 on Windows XP. As and when needed, I will also provide the information on the data size and other variable factors.
Definitions
Before we dive deep into this article, it would be better if we understand the terminologies and their definitions as it would be easy to understand the contents in this article.
| Fragmentation | A process by which the spaces occur due to frequent updates and deletes in the page. More this fragmentation more the performance issues or maintenance will be. |
| Index Level | This value provides the number of levels from the bottom up of an index. Please note that Leaf Node will always have 0. |
| Clustered Index Seek | Index seek uses a clustered index to look up the rows to be returned directly |
| Index Depth | Number of levels we need to pass to get into the index. |
| Record Count | Number of records available in the page. In general this value at index level 0 will be the number of records in the table. But in case of any forwarding count, then there will be some differences |
| Extent | An extent is the basic unit of storage used to allocate space for tables and indexes within a given data file. It is made up of eight contiguous 64KB data pages |
| Page | A page is the unit of allocation within a specific extent. There are eight pages in every extent. Please note that a page is the last level you reach before hitting at the actual data row. |
| Page Splits | Whenever a page becomes full, half of the data is shifted to the newly created page and pointers are kept in the old page pointing to the new location of the data. This is called Page Split. In case of a clustered index the inserted row which does not have enough space alone is shifted to the newly created page. |
| Row Offset | For every row that we insert into the database, SQL Server will place the “Row Offset” located at the end of the page, to actually indicate where in that page the row data starts. Indirectly it tells us how many spaces we need to leave to read the data from the top. |
| Index Seek | Index seek uses a non-clustered index to look up the rows to be returned directly |
| Nested Loops | Takes the inputs from two data sets and performs the scanning of the outer set for each inner set. Very efficient operation if the data sets are small. |
| RID Lookup | The query optimizer performs a RID LookUp, a type of bookmark lookup that occurs on a heap table and uses a row identifier to find the rows to return. |
| Key Lookup | A Key Lookup is a bookmark lookup on a clustered table and which essentially means that the optimizer cannot retrieve the rows in a single operation, and has to use a clustered key to return the corresponding rows from a clustered index (or from the table itself). |
| AVERAGE_FRAGMENTATION_ IN_PERCENT | The degree of fragmentation in the index tree which is based on pages or extents that are not in sync or out of order (i.e.) the pointer to the logical next page is not the same as the physical next page. Please note that we need to be looking for a low number here, though how low depends on the the scenario and requirements |
Clustered Index
Clustered index, as the name suggests, the indexes are clustered in one place with the data. The major advantage (in some cases disadvantage) with clustered index is the arrangement of data in one particular order. Since the data is arranged in one order, we can have only one clustered index in a table. By default, as soon as a primary key for a table is created, internally SQL Server creates a clustered index. However, when we create a unique key, we can specify either clustered or non-clustered index. Creating a clustered index is a time consuming task and in case if this is not defined properly then it may lead to lot of performance issues. In this section, we will see where this can be a gain and where this can lead to performance issues.
Scenario 1: Well defined Cluster Key
Let us create a table called “Accounts” having information related to a company accounts as follows:
CREATE TABLE ACCOUNTS (ACCOUNTNUMBER NUMBER(6), ACCOUNTNAME VARCHAR(50), ACCOUNTDESCRIPTION VARCHAR(50), ACCOUNT_TYPE VARCHAR(1))
Let us now create a clustered index on this table on the column ACCOUNTNUMBER.
CREATE CLUSTERED INDEX ACCNTCLUSTIDX ON ACCOUNTS(ACCOUNTNUMBER);
In order to test various scenarios, let me load 50,000 records into this table using the following script:
BEGIN
DECLARE @COUNTER AS INT
SET @COUNTER=1
WHILE (@COUNTER <= 50000)
INSERT INTO ACCOUNTS VALUES (@COUNTER, ('ACCOUNT ' + CAST(@COUNTER AS VARCHAR)), ('ACCOUNT DESC ' + CAST(@COUNTER AS VARCHAR)), (CASE (CAST(@COUNTER AS VARCHAR) % 4) WHEN 1 THEN 'A' WHEN 2 THEN 'L' WHEN 3 THEN 'E' WHEN 4 THEN 'I' ELSE NULL END))
END
We have four different types through which we classify the account as Asset (A), Liability (L), Expense (E) and Income (I).
Now to insert these many records using SQL Server Management Studio (SSMS), it took around 23 seconds.
The following query is a simple select statement filtering the data on accountnumber.
SELECT * FROM ACCOUNTS WHERE ACCOUNTNUMBER < 1000
The above query got executed in no time and returned the 999 rows. The actual execution plan showed that the query took the CLUSTERED INDEX SEEK plan, which actually used the index and other parameters (or some authors term as attributes) are also well within the bounds. Please also note that since the key what we have chosen to index is well organized, whatever be the filter on the clustered key it would use the index.
Since this clustered key is well organized, even the page splits and fragmentation should be well within the limits. We can look at this in three different angles. First let us truncate the existing records in this table and load the table with three sets of data. First data will be from 1 to 150,000. Second will be from 300,000 and 500,000. And final set will be between 150,000 and 300,000. At the initial stage before starting this experiment the page splits per second is 18,562 as calculated through the following query:
SELECT * FROM SYS.SYSPERFINFO WHERE COUNTER_NAME = 'PAGE SPLITS/SEC' AND OBJECT_NAME LIKE '%ACCESS METHODS%'
Now the fragmentation for the index on this table is provided below:
No comments:
Post a Comment