Page 1 of 1

Restart a job

Posted: Tue May 19, 2009 12:07 am
by amit_dwh
Hi

I was seeking some suggestions for designing a job restartability scenario.

Scenario :
I have a DS job consisting of StageA,StageB and StageC.Suppose for incoming 100 rows StageA and StageB have done their processing of 100 records but job fails at StageC .
Is it possible to reuse the already processed data from StageA or StageB which could be fed directly to StageC for processing.

I was wondering if processed link data from StageB could be reused and sent to StageC.

I do not want to use intermediate Datasets as they increase the I/Os.

Any suggestions are highly appreciated.

Thanks

Posted: Tue May 19, 2009 12:15 am
by ray.wurlod
No. Once a stage failure causes the job to stop all virtual Data Sets (data in memory) are discarded. If you want to preserve them then you DO need intermediate storage (which implies separate job for Stage C at the very least).

There *may* be something clever you could do with message queues, but you would need to provide for draining these on successful completion.

Posted: Tue May 19, 2009 12:24 am
by amit_dwh
Thanks Ray.

Thats what i was trying to figure out if i can get access to virtual datasets.But as they are released with job termination then we have to do the work in bits and pieces.