Simple audit should compare numbers from the source to the target warehouse. It should count or sum some number of rows to the source.
AGGREGATE TOTALS:
1. Did all the rows in the source end up in the target?
2. Do the columns in the fact table total the same to the source?
3. In aggregate do you match?
Do simple counts of rows in the target compared to row counts of source tables. This is valid for dimension tables. The sums should validate the fact tables.
DETAIL TOTALS:
1. Sum last month's transactions for a single customer. Do they match?
2. If you are loading on a daily or incremental basis then total all tranasctions in the source for yesterday compare to today's loaded records.
3. Do the same for one dimension record for each dimension table.
If you do not know that your data is accurate at an aggregate level then any detail level validation is worthless. Joining a target table to a source table and seeing that a specific column matches proves nothing but that one column is accurate for that specific set of records. If you are accurate at the global or aggregate level then this does prove that the detail level is probably accurate as well.
DO NOT USE THE ETL TO VALIDATE THE ETL.
If you extract data into a staging table and then compare the target to the staging table proves nothing. The original extract which built the staging table could be wrong.
Simple checks:
1. Select count(*) from Customers
(write the result into
ETL_AUDIT.Source_Target = "S"
ETL_AUDIT.Target_Table = "CUSTOMER_DIM"
ETL_AUDIT.MEASURE = Count(*)
2. Select count(*) from Customer_Dim
(write the result into
ETL_AUDIT.Source_Target = "T"
ETL_AUDIT.Target_Table = "CUSTOMER_DIM"
ETL_AUDIT.MEASURE = Count(*)
3. Do this for every dimension.
Calculate the completeness or row count of target divided by row count for source for each table.
Fact tables are more complicated to validate or QA.
1.
Code: Select all
Select sum(Invoice_Amt) from Invoices
ETL_AUDIT.Source_Target = "S"
ETL_AUDIT.Target_Table = "INVOICE_FACT"
ETL_AUDIT.MEASURE = sum()
2.
Code: Select all
Select sum(Invoice_Amount) from Invoice_Fact
ETL_AUDIT.Source_Target = "T"
ETL_AUDIT.Target_Table = "INVOICE_FACT"
ETL_AUDIT.MEASURE = sum()
All of this is a part of EtlStats. ETL_AUDIT is called ETL_QA_RESULT. The SQL to calculate it measure is stored in ETL_QA_SQL.SQL_TO_RUN. The completeness report is stored in ETL_QA_RESULT_MAP. The jobs are included. The DDL for each table is included. This is all functional and working at my last customer and soon will be working at my current customer.