In today's corporate world, data is spread across multiple data sources (ex: DB2, Oracle, Sybase etc) and business users wish to generate types of business reports from a single source, no matter how the data is distributed. This scenario calls for data integration from multiple data sources and demands for new pertinent technical challenges:
a) Data sources with different formats and organized in structured, semi-structured, and unstructured data.
b) Huge data volumes and data populated from source systems arriving at different times.
c) Data quality and transforming the data into a format that is meaningful to business analysts.
Microsoft SQL Server Integration Services (SSIS) package enables developers to address the business users need. However testing of these packages introduces new challenges in testing. This scenario calls for the following needs:a) How do we validate complex sql queries?b) How do we ensure data workflow validation?c) How do we validate multiple data validation checkpoints?d) How do we ensure data correctness between the source and target, with millions of records?e) How do we approach continuous testing?
Recommended validation focus areas for this includes :
- Business rules validations and Data correctness
- Validate containers and tasks inside the SSIS packages from a functional perspective
- Validate usage of relevant field and table names, based on business rules
- Validate relevant column mapping between the source and the target
- Validate data truncation and data types between source and target
- Check for duplicate records
- Validate row count between the source and target
- Validat data correctness between the source and target
From a testing perspective, we must be able act upon these focus areas using both manual and automated testing.
Manual Testing:
List of functional and business requirements for each individual task needs to be identified and listed. SQL scripts present in the individual task needs to be validated against the functional and business requirements. SQL queries validation for appropriate field name, appropriate joins, relevant column conversion, column type-casting and relevant table names.
Test Automation:
Using test libraries , we can validate the structure of the data and data correctness. Based on the needs , i have created a test automation tool named "SSISTProbe". This tool works on the philosophy of XML tag based test automation approach. Some of key features of this tool is:
- Built-in intelligence, which executes relevant test actions based on the user XML tags
- COM based tasks and component execution , helps to execute tests in a non-UI environment
- Listeners to handle extraction of huge amount of data
- Data comparsion API's, to compare billions of records
- Multiple execution checkpoint support, which helps in enabling checkpoints in packages, databases, logs etc
This tool is open-sourced and the hompage url is
http://www.codeplex.com/SSISTProbe