Page 1 of 1

Error Processing

Posted: Tue Jan 02, 2007 7:11 pm
by pradkumar
Hi

What is an Error Processing table and Audit table.
Plz let me know the concept behind these tables.
How these tables could be built?

Posted: Tue Jan 02, 2007 7:55 pm
by ray.wurlod
Something a site chooses to implement.

Nothing to do with DataStage, though that may be the tool used to populate these tables.

My guesses would be as follows.
  • Error processing table records any errors generated within ETL processing.

    Audit table is used to store row count reconciliations.

Posted: Wed Jan 03, 2007 4:09 am
by kumar_s
Hi pradkumar,
You can let us know, what is this regards to.
It can be either the either the normal naming conventions used in your site or project for the details given by Ray or it can be referred to the database internal tables.

Posted: Wed Jan 03, 2007 4:26 pm
by pradkumar
This is for keeping track of errors. This is a good practice for reconciliation and also to know what errors are present and how to reprocess them.
I did not build one until one.
I need to get more clear picture of how to build it and populate it

Posted: Wed Jan 03, 2007 4:30 pm
by kumar_s
As mentioned earlier, its one of the process you like to follow in you development. Error processing tables will mostly be the one which collect all the rejects with the error code. There are several approaches, based on you methodology. Its more flexible if you stored procedure.
Audit table will be the one, which will have the total row count, start time, end time of all the tables while data population (Including Error processing Tables).

Posted: Wed Jan 03, 2007 5:10 pm
by ray.wurlod
You build these tables according to the specifications of your customer. What information do they want recorded? Once you have those answers, you can figure out how to capture these from DataStage and elsewhere, and create tasks for recording them into the tables.

Posted: Thu Jan 04, 2007 2:03 am
by pradkumar
Right now I am trying to build an audit table which consists of the following information:

1) Batch number of the daily run
2) Extraction start time
3) Extraction end time
4) Time elapsed

SO where to insert this link in the job and how to populate these columns. This table will be an ouput link from Transformer (I guess)

Posted: Thu Jan 04, 2007 2:08 am
by kumar_s
Not necessarily, it can be even on Before/After SQLs. In the before SQLs of the input stage, you can insert the record with the start time as SYSDATE and current batch id as Jobparameter which is passed from JobSequence or the Main job control. And in the After job control populate you can populate the SYSDATE for End time and StartDate - EndDate as ElaspsedTime.

Posted: Thu Jan 04, 2007 4:49 am
by ray.wurlod
You never want to do it IN the job.

Do it afterwards, so that you don't slow the job itself.

Posted: Thu Jan 04, 2007 11:46 am
by pradkumar
Ray,

In ur reply "Do it afterwards, so that you don't slow the job itself. "

Afterwards means in the target stage I should start populating my audit table. Or is something else you were mentioning about

Posted: Thu Jan 04, 2007 12:09 pm
by Pavan_Yelugula
Hi
Ray is just suggesting you to capture the audit information outside the actual Job(Main Logic whatever you are doing) coz if you include the audit capturing code in the actual job it will slow your actual job :idea: . hope you got it...

Thanks
Pavan

Posted: Thu Jan 04, 2007 3:02 pm
by ray.wurlod
I'd always advocate a separate job. Of course, you may need to detect and trap error information in the job itself, for such is the nature of these things, but simply redirect the error information into a file. The row counts and started/stopped times can be obtained subsequently. Use the times from active stages rather than from jobs (or both) if you're interested in discounting startup and shutdown overheads.