Pros & Cons of Audit stage

Madhav_M · Post by **Madhav_M** » Tue Jun 28, 2005 5:43 am

Hi
We are struggling to make decission whether to go for Audit stage or Normal Transformer approach.

Our case is like Business rule going to be keep on added and also going to be huge... The option we r thinking is:
1. Use audit stage and define rules as filters and generate exception table and use tht exception table with source(may be stage table) to generate table.
2. Use tranformer and filter(where clause) at source

The option 1 we thing maintanence wise very easy to go about, but we feel that is very performance intensive!!

In option 2 maintanability is hectic!!

Looking forward for your suggestions.

Thanks
Madhav

vmcburney · Post by **vmcburney** » Sun Jul 03, 2005 7:17 pm

We went with business rules and constraint filters written in transformers and directing messages to sequential files. These files were then collected and loaded into a couple custom message tables. From this we can peruse messages for failed or dirty rows and get metrics. We audit the data as it moves through the ETL jobs. We use stage variables for most of these checks and have a standard output format to produce the same set of message columns from all jobs.

Stage variables helps you organise your rules within a single transformer at the end of a DataStage job. You can check and output multiple rule messages. Good naming conventions and standard error codes and checks helps organise it.

We have the ability to decide whether a rule is KEEP or DROP for each row. Sometimes we want to report a rule failure but still want to deliver the row to the target.

ray.wurlod · Post by **ray.wurlod** » Sun Jul 03, 2005 10:15 pm

At one site where I worked late last year the requirement was similar to yours. Indeed, the business rules depended on what data were available in addition to changing over time. We implemented a "late binding" approach, where a table-driven approach was used to select the appropriate business rule, which was then executed via a "rule dispatcher" routine using "indirect CALL" (which you can read about in the DataStage BASIC manual - it's the CALL @subrname(args) syntax).

Needless to say, this was in a server job environment, not a parallel job environment. Still handling millions of rows, however.

I have never used AuditStage in-line (wasn't aware that it was easily possible), but I have used QualityStage in-line to perform data cleansing and resultant validation.