Sunday, November 28, 2010

Bitmap Indexes - Part 2

Case 1: Using BITMAP for BULK Load.

To illustrate this, I have created a table called bitmapdemotable as provided below:

    create table bitmapdemotable (runnumber number(6), runname varchar2(15), runtype number(2));

I have also created a procedure to populate this table. Please note that I am using a "While Loop" here to simulate row-by-row insert operation.

    create or replace procedure bitmapproc(noofrecords in number) is
    counter number;
    begin
        counter := 1;
        while (counter <= noofrecords)
        loop
            insert into bitmapdemotable values (counter, 'a' || counter,mod(counter,2));
            counter := counter + 1;
        end loop;
        commit;
    end;

Next, I have created an index on this table for the column "runtype" which actually holds either 0 or 1.

    create bitmap index bitmapidx on bitmapdemotable(runtype);

In order to study the behaviour, we need to use TKPROF. For this we need to change the configuration settings of TIMED_STATISTICS and SQL_TRACE

    alter session set timed_statistics=true;
    alter session set sql_trace=true;

Now, I am executing the procedure by passing the value as 50000.

    exec bitmapproc(50000)

When we see the trace file (after extracting the contents of the trace file into a local text file using TKPROF), it looks like the following figure:

The initial is for the procedure as a whole (does not include the data for the insert statement as it is separately captured) and the second is for the insert statement. Please note that the insert statement took more CPU, elapsed time and logical I/Os (both query and current). Now, please also note that ELAPSED is greater than CPU, which may be due to some wait events.

Now, instead of row-by-row insert we have captured the statistics for the bulk load and the result is shown in the following figure:
As expected, the SELECT statement, led to FULL TABLE SCAN. But, when we look at the  statistics, we can observe vast difference between the row-by-row and BULK load. The reason being, when we do a row-by-row insert, the B-tree is used to locate the leaf nodes that contain bitmap segments for a given value of the key. Start ROWID and the bitmap segments are used to locate the rows that contain the key value. When changes are made to the key column in the table, bitmaps must be modified. This results in the locking of the relevant bitmap segments. Because locks are acquired on the whole bitmap segment, a row that is covered by the bitmap cannot be updated by other transactions until the first transaction ends. Oracle server should repeat this for every row that gets either inserted or updated. Hence row by row manipulation on the rows consists of bitmap index is costly.

Bitmap Indexes - Part 4

SQL> select * from bitmapdemotable where runtype=1;
125000 rows selected.
SQL> select * from btreedemotable where runtype=1;
125000 rows selected.
This states that as far as the query is concerned, there is no difference between the query results on the tables that actually uses btree index and bitmap index. We have considered the scenario where we have two distinct values for run type. Now, let us create the similar type of indexes in another column in the same table and check. Also, we will drop the existing indexes on the runtype column.

drop index btreeidx;
drop index bitmapidx;

To create the indexes on runname column:

create index runnamebtree on btreedemotable(runname);
create bitmap index runnamebitmap on bitmapdemotable(runname);

exec DBMS_STATS.gather_index_stats('HARI', 'RUNNAMEBTREE');
exec DBMS_STATS.gather_index_stats('HARI', 'RUNNAMEBITMAP');

SQL> select * from bitmapdemotable where runname='a1000';
SQL> select * from btreedemotable where runname='a1000';
I actually tried to insert into the tables (with index on runname) and found the following:

SQL> insert into btreedemotable select * from btreedemotable;
250000 rows created.
SQL> insert into bitmapdemotable select * from bitmapdemotable;
250000 rows created.
Please notice the huge difference between the tables having different indexes on the similar column.

To conclude:
a)     There is no much difference between b*tree and bitmap as far as the query execution is concerned. Both are exhibiting similar statistics
b)     The difference comes when we try to perform DML operations on the indexes.

Bitmap Indexes - Part 3

Case 2: Comparison between B*Tree and Bitmap

Let us illustrate this point by identifying when a B*Tree can be used and when we should use Bitmap. Let us consider the same example of what we have done it in case 1 and see if there is any difference when we use b*tree instead of bitmap

Let us create a table for creating a b*tree index, as follows:

   create table btreedemotable as select * from bitmapdemotable where 1=0;
   create index btreeidx on btreedemotable(runtype);

Let us create the same procedure, but to populate btreedemotable.

create or replace procedure btreeproc(noofrecords in number) is
counter number;
begin
        counter := 1;
        while (counter <= noofrecords)
        loop
            insert into btreedemotable values (counter, 'a' || counter,mod(counter,2));
            counter := counter + 1;
        end loop;
        commit;
end;

alter session set timed_statistics=true;
alter session set sql_trace=true;
Please note that the row-by-row insert for a b*tree index based table works like a champ as compared to bitmap index based table. Now, we will try to select it from the table and see if there are any differences:

SQL> select * from btreedemotable where runtype=1;
25000 rows selected.
SQL> select * from bitmapdemotable where runtype=1;
25000 rows selected.
Except for the minor changes in the statistics and cost, there seems to be no big difference between the two tables having different index types. Now, let us check for the query that returns 125,000 records.

Bitmap Indexes



Database Version: Oracle 10g
Operating System: Windows XP

I was working on tuning a procedure in my current company, when I found something interesting with my colleague. He has created a table with a bitmap index for one of the columns. The insert into the table was found to be very slow and when I checked in the process, he is actually doing a single row operation (of around 50K every day) which is actually taking couple of minutes to insert. There was actually an argument with him and he claims that the slow in insert operation is not due to bitmap index and it is because of something else. Further he claimed that BITMAP index is similar in B*Tree index. Not only him, many could not understand when we should create BITMAP INDEX and what actually it is.

In this article, I will explain what is BITMAP Index, their operational difference (or comparison) and when do we need to create BITMAP indexes. As always said, please do provide your comments.

What is BITMAP index?

Oracle created BITMAP index to cater the need for data warehousing and adhoc query mechanism. In general, usage of bitmap index in OLTP is not a highly advisable solution. In a bitmap index, a bitmap for each key value replaces a list of rowids. Each bit in the bitmap corresponds to a possible rowid, and if the bit is set, it means that the row with the corresponding rowid contains the key value. Oracle has created a proprietary mapping function that converts the bit position to an actual rowid, so that the bitmap index provides the same functionality as a regular index. Bitmap indexes store the bitmaps in a compressed way.

Bitmap indexes are most appropriate on low distinct cardinality data (i.e., data with relatively few discrete values when compared to the cardinality of the entire set. It is difficult to define what low distinct cardinality is?. This is data where the number of distinct items in the set of rows divided by the number of rows is a small number (near zero). For example, in a table of 10000 rows, 1 or 2 distinct valued column would be appropriate for a bitmap index. However, in a table of 1000000 rows, 100 or 200 distinct valued column would be appropriate for a bitmap index. I have seen people telling me that columns with more than 5 or 10 distinct columns cannot participate in a bitmap index, which is actually wrong.

In general, when the data is packed for bitmap index, it is not packed like b*tree index. In case of b*tree index, the leaf node will have a list of rowids. In case of bitmap index, the leaf node contains bitmap, with the value set if the key value is available and not set if not. Each leaf node contains the entry header, key, start rowid, end rowid and bitmap segment. A bitmap segment consisting of a string of bits. The bit is set when the corresponding row contains the key value and is unset when the row does not contain the key value. Entry header, which contains the information on the number of columns and lock information. The Oracle server uses a patented compression technique to store bitmap segments and also bitmap Index uses restricted rowid format

Difference between Bitmap and B*Tree Indexes

Bitmap indexes actually store pointers to many rows with a single index key entry, as compared to a B*Tree. In B*Tree index, we will be able to pair the index keys and the corresponding rows in a table. In a bitmap index, there will be a very small number of index entries, each of which points to many rows. In a conventional B*Tree, one index entry points to a single row.

When do we need BITMAP index?


We need BITMAP index for data warehousing application, Adhoc Query Processing, and when the data load is quiet large(and we have bulk processing). We should never use a BITMAP index for OLTP processing. We shall consider couple of scenarios where in we check on the usage of BITMAP index and study the response. On the first case, I will show what happens if we use BITMAP index for row-by-row insert and on the second case, we shall see the operational difference between b*tree index and Bitmap Index.

Part 2     Part 3   Part 4