Page 1 of 1

Is there any Automation Process to hanlde Data

Posted: Mon Mar 30, 2009 1:57 am
by John Daniel
Hi All,

Please give me some inputs on this

I have source of 100 records, Source 80 are records are processed to target. I got an error in the source at 81st record, So I need to start the job from 82nd records onwards.

Is there any automatic hanlding process is there to handle this kind of issues in DataStage(Px).

Looking for your kind reply in this.

Regards,
John

Re: Is there any Automation Process to hanlde Data

Posted: Mon Mar 30, 2009 2:34 am
by Pagadrai
Hi,
Iam not very clear on what is your idea of automatic handling.
You have lot of approaches to do it anyway-

you can use a 'flag' for identifying unprocessed records
or use a change data capture stage to process only changed ones.

Posted: Mon Mar 30, 2009 2:42 am
by ray.wurlod
Welcome aboard.

There's nothing automatic, but you can design recovery.

As noted, designing recovery will require that you keep track of how far the job has got before it fails.

Posted: Mon Mar 30, 2009 4:17 am
by BugFree
ray.wurlod wrote:Welcome aboard.

There's nothing automatic, but you can design recovery.

As noted, designing recovery will require that you keep track of how far the job has got before it fails. ...
Ray, are you reffering to the other new post with subject line "Disigning Recovery handling" :? :D

Posted: Mon Mar 30, 2009 3:56 pm
by ray.wurlod
Look at the timestamps on the two posts. John has responded with a separate thread to ask a separate question, which is the way things should be.

There is no need to quote everything - it only wastes space.

Posted: Mon Mar 30, 2009 5:54 pm
by vmcburney
It's a very risky thing to try and restart an ETL process that stopped midway through and best avoided. For starters it is hard working out how much data was delivered to the target - for a database target you need to take into account array sizes and transactions sizes to discover how many rows were actually saved and how many were rolled back. While your job ended at the 81st row it may have left the target table back at the 60th row. Second the parallel job design and partitioning means you don't know for sure what rows from the source have been processed. You may have one partition that is up to the 81st row but you may have another that has already processed the 82nd row.

It is safer to roll back your changes and process from the beginning or better yet trap you bad data row into a reject link and into an exceptions file so your job keeps processing. You can put reject links onto sequential file stages or lookups or transformers or database target stages to trap errors rather than aborting the job.