Page 1 of 1

Job Sequencer status Finished

Posted: Wed Dec 24, 2008 1:59 am
by s_porkalai
I have two sequencers, Child sequencer (sq_Child05) and Parent sequencer (sq_Parent).


Parent sequencer
|
|
Child sequencer
| |
| |
jb_16, jb_25

The child sequencer that calls two jobs jb_16 and jb_25. In the child sequencer i'm using Exception Handler stage that goes to a notification activity stage. I enabled "Add checkpoint so sequencer is restartable on failure" and "Automatically handle activation that fail" and i disabled "Donot checkpoint run".

The Parent sequencer that calls child sequencer and has no Exception Handler stage. I enabled "Add checkpoint so sequencer is restartable on failure" and "Automatically handle activation that fail" and i disabled "Donot checkpoint run".


Problem:
In the first run, job jb_16 got aborted and consecutively child & Parent sequencer also got aborted. Again when i restarted the Parent sequencer (Currently in Aborted/restartable state), the job jb_16 got aborted but the Child & Parent sequencer is getting Finished.

Question:
Why the Child & Parent sequencer is getting finished (during second run/restarted) even when the job gets aborted?
How to make the Child & Parent sequencer in Aborted/restartable state?


First run:
===========

Starting Job sq_Child05.
Environment variable settings:(...)
sq_Child05..JobControl (@Coordinator): Starting new run of checkpointed Sequence job
sq_Child05..JobControl (@ec_CreateConfig_05): Executed: $APT_GRID_HOME/sequencer.sh(...)
sq_Child05..JobControl (@ec_CreateConfig_05): Omitted checkpoint for execution of command '$APT_GRID_HOME/sequencer.sh'
sq_Child05 -> (jb_25): Job run requested(...)
sq_Child05..JobControl (DSRunJob): Waiting for job jb_25 to start
sq_Child05 -> (jb_16): Job run requested(...)
sq_Child05..JobControl (DSRunJob): Waiting for job jb_16 to start
sq_Child05..JobControl (DSWaitForJob): Job jb_16 has finished, status = 3 (Aborted)
sq_Child05..JobControl (@JA_RT16): Job jb_16 did not finish OK, status = 'Aborted'
sq_Child05..JobControl (@JA_RT16): Report on job: jb_16(...)
sq_Child05..JobControl (@JA_RT16): Controller problem: Unhandled abort encountered in job jb_16
sq_Child05..JobControl (@JA_RT16): Will execute error activity: EH_Verify_Error
sq_Child05..JobControl (DSSendMail): Sent message to 'abc@xyz.com'
sq_Child05..JobControl (DSWaitForJob): Waiting for job jb_25 to finish
sq_Child05..JobControl (DSWaitForJob): Job jb_25 has finished, status = 1 (Finished OK)
sq_Child05..JobControl (@JA_RT25): Report on job: jb_25(...)
sq_Child05..JobControl (@JA_RT25): Checkpointed run of job 'jb_25'
sq_Child05..JobControl (@Coordinator): Summary of sequence run(...)
sq_Child05..JobControl (fatal error from @Coordinator): Sequence job (restartable) will abort due to previous unrecoverable errors
Attempting to Cleanup after ABORT raised in stage sq_Child05..JobControl
(sq_Parent) <- sq_Child05: Job under control finished.



Second run:
============

Starting Job sq_Child05.(...)
Environment variable settings:(...)
sq_Child05..JobControl (@Coordinator): Sequence job is being restarted after failure(...)
sq_Child05..JobControl (@ec_CreateConfig_05): Executed: $APT_GRID_HOME/sequencer.sh(...)
sq_Child05..JobControl (@ec_CreateConfig_05): Omitted checkpoint for execution of command '$APT_GRID_HOME/sequencer.sh'
sq_Child05..JobControl (@JA_RT25): Skipped run of job 'jb_25' on restart
sq_Child05..JobControl (DSPrepareJob): Attempting to reset failed job jb_16
sq_Child05 -> (jb_16): Job reset requested
sq_Child05..JobControl (DSRunJob): Waiting for job jb_16 to start
sq_Child05..JobControl (DSWaitForJob): Waiting for job jb_16 to finish
sq_Child05..JobControl (DSWaitForJob): Job jb_16 has finished, status = 21 (Has been reset)
sq_Child05 -> (jb_16): Job run requested(...)
sq_Child05..JobControl (DSRunJob): Waiting for job jb_16 to start
sq_Child05..JobControl (DSWaitForJob): Waiting for job jb_16 to finish
sq_Child05..JobControl (DSWaitForJob): Job jb_16 has finished, status = 3 (Aborted)
sq_Child05..JobControl (@JA_RT16): Job jb_16 did not finish OK, status = 'Aborted'
sq_Child05..JobControl (@JA_RT16): Report on job: jb_16(...)
sq_Child05..JobControl (@JA_RT16): Controller problem: Unhandled abort encountered in job jb_16
sq_Child05..JobControl (@JA_RT16): Controller problem: Unhandled abort encountered in job jb_16
sq_Child05..JobControl (@JA_RT16): Will execute error activity: EH_Verify_Error
sq_Child05..JobControl (DSSendMail): Sent message to 'abc@xyz.com'(...)
sq_Child05..JobControl (@Coordinator): Summary of sequence run(...)
Finished Job sq_Child05.
(sq_Parent) <- sq_Child05: Job under control finished.

Posted: Wed Dec 24, 2008 3:02 am
by dhanashreepanse
I assume that jb_16 and jb_25 are the Job Activity stages.

In these 2 stages and the child sequence, what option have you set for Execution Action?

Posted: Wed Dec 24, 2008 3:25 am
by s_porkalai
Yes, jb_16 and jb_25 are Job Activity stages.

The Execution Action has been set to "Reset if required, then run" for the two Job Activity stages and the child sequence.

Posted: Wed Dec 24, 2008 3:42 am
by ray.wurlod
Go to the job sequence's job properties, select "Automatically handle activities that fail" and re-compile. Without this set, jobs under control can abort and the sequence finish with an OK status - you have to inspect the job sequence's log to determine the cause. With it checked, and without explicit error handling in the job sequence itself, the job sequence will abort if any of its activities reports a failure.

Posted: Wed Dec 24, 2008 4:36 am
by dhanashreepanse
Try one more approach:
Don't run the parent sequence...just run the child sequence.
Observe if you get the same result.
This would help you to identify where the exact problem is.

Posted: Fri Dec 26, 2008 1:26 am
by s_porkalai
Hi Ray,
The option "Automatically handle activities that fail" has been set both in Parent & Child sequence. Still the Parent sequencer is getting Finished (i,e during second run). I want the Parent sequencer to be in Aborted/restartable state whenever the jobs are aborted.

Posted: Fri Dec 26, 2008 1:29 am
by s_porkalai
Hi dhanashreepanse,
I tried running the Child sequencer alone, the sequence getting aborted in the first run and getting finished in the second run(i,e i have restarted the sequence) even though the job is in aborted state.

Posted: Fri Dec 26, 2008 9:24 am
by kandyshandy
Porkalai, What do you have in the trigger page of your first job in the child sequence? I guess here is where the problem is !!

Posted: Fri Dec 26, 2008 11:47 pm
by s_porkalai
Hi,
There is no trigger page in both the jobs and all are running parallel. The Exception Activity stage is independent and it is not connected with jb_16 and jb_25.

You can see the First run's Job log, The child sequence has raised an ABORT "Attempting to Cleanup after ABORT raised in stage sq_Child05..JobControl" but in the Second run's Job log there is no Abort entry.

I want to know why the Child sequence is not aborted in the second run?
Is there any way to abort the Child sequence whenever job aborts?

Posted: Sat Dec 27, 2008 4:09 pm
by kandyshandy
Just do a test.. Keep them running in parallel but add some dummy process after two jobs with some triggers on the output links of 2 jobs. For your scenario, you might need some "sequencing" so that the selected options will work.

Posted: Mon Jan 05, 2009 12:34 am
by dhanashreepanse
Hi

Not sure if you found a solution to your problem, but here's something that might be useful :

(This is from DS Director manual)
If, during sequence execution, the flow diverts to an error handling
stage, DataStage does not checkpoint anything more. This is to
ensure that stages in the error handling path will not be skipped if
the job is retarted and another error is encountered.