Pros & Cons of Audit stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Madhav_M
Participant
Posts: 43
Joined: Sat Jul 10, 2004 5:47 am

Pros & Cons of Audit stage

Post by Madhav_M »

Hi
We are struggling to make decission whether to go for Audit stage or Normal Transformer approach.

Our case is like Business rule going to be keep on added and also going to be huge... The option we r thinking is:
1. Use audit stage and define rules as filters and generate exception table and use tht exception table with source(may be stage table) to generate table.
2. Use tranformer and filter(where clause) at source

The option 1 we thing maintanence wise very easy to go about, but we feel that is very performance intensive!!

In option 2 maintanability is hectic!!

Looking forward for your suggestions.

Thanks
Madhav
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

We went with business rules and constraint filters written in transformers and directing messages to sequential files. These files were then collected and loaded into a couple custom message tables. From this we can peruse messages for failed or dirty rows and get metrics. We audit the data as it moves through the ETL jobs. We use stage variables for most of these checks and have a standard output format to produce the same set of message columns from all jobs.

Stage variables helps you organise your rules within a single transformer at the end of a DataStage job. You can check and output multiple rule messages. Good naming conventions and standard error codes and checks helps organise it.

We have the ability to decide whether a rule is KEEP or DROP for each row. Sometimes we want to report a rule failure but still want to deliver the row to the target.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

At one site where I worked late last year the requirement was similar to yours. Indeed, the business rules depended on what data were available in addition to changing over time. We implemented a "late binding" approach, where a table-driven approach was used to select the appropriate business rule, which was then executed via a "rule dispatcher" routine using "indirect CALL" (which you can read about in the DataStage BASIC manual - it's the CALL @subrname(args) syntax).

Needless to say, this was in a server job environment, not a parallel job environment. Still handling millions of rows, however.

I have never used AuditStage in-line (wasn't aware that it was easily possible), but I have used QualityStage in-line to perform data cleansing and resultant validation.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply