multiple invocations of a multi instance job failing
Moderators: chulett, rschirm, roy
-
- Charter Member
- Posts: 64
- Joined: Sat Sep 17, 2005 10:42 am
multiple invocations of a multi instance job failing
We have a multi instance AUDIT job that runs for each and every job and collect the job stats and load them to AUDIT tables. Sometimes the job is failing when couple instances of the job is running with multiple invocations. We got the following error.
Error calling DSRunJob(SEQX_ROUTINE_SAVE_JOB_INFO.J040_CIMS_MPI_SDS_PHYSICIANS_ODS_O_CUST_ID_test.CCC_sAUDIT_SK), code=-2 [Job is not in the right state (compiled and not running)]
To test the parallel invocation of a multi-instance job,I have created a test Job Sequencer which calls a multi instance job with five different invocations. Some times job is completing successfully where as sometimes the job is failing with one of the INVOCATION CALL is failing with a job status 99.
Do we need to set something at the PROJECT level to support multi instance jobs.Please share if you have faced this issue eariler.
SEQX_PARALLEL_AUDIT_test321..JobControl (@Coordinator): Summary of sequence run
12:06:43: Sequence started (checkpointing on)
12:06:43: EEE (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.EEE) started
12:06:45: DDD (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.DDD) started
12:06:48: CCC (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.CCC) started
12:06:50: BBB (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.BBB) started
12:06:52: AAA (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.AAA) started
12:10:17: CCC (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.CCC) finished, status=2 [Finished with warnings]
12:10:18: AAA (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.AAA) finished, status=99 [Not running]
12:10:19: Exception raised: @AAA, Unhandled abort encountered in job J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.AAA
12:10:22: EEE (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.EEE) finished, status=2 [Finished with warnings]
12:10:23: DDD (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.DDD) finished, status=2 [Finished with warnings]
12:10:24: BBB (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.BBB) finished, status=2 [Finished with warnings]
12:10:24: Sequence failed (restartable)
Thank You
Error calling DSRunJob(SEQX_ROUTINE_SAVE_JOB_INFO.J040_CIMS_MPI_SDS_PHYSICIANS_ODS_O_CUST_ID_test.CCC_sAUDIT_SK), code=-2 [Job is not in the right state (compiled and not running)]
To test the parallel invocation of a multi-instance job,I have created a test Job Sequencer which calls a multi instance job with five different invocations. Some times job is completing successfully where as sometimes the job is failing with one of the INVOCATION CALL is failing with a job status 99.
Do we need to set something at the PROJECT level to support multi instance jobs.Please share if you have faced this issue eariler.
SEQX_PARALLEL_AUDIT_test321..JobControl (@Coordinator): Summary of sequence run
12:06:43: Sequence started (checkpointing on)
12:06:43: EEE (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.EEE) started
12:06:45: DDD (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.DDD) started
12:06:48: CCC (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.CCC) started
12:06:50: BBB (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.BBB) started
12:06:52: AAA (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.AAA) started
12:10:17: CCC (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.CCC) finished, status=2 [Finished with warnings]
12:10:18: AAA (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.AAA) finished, status=99 [Not running]
12:10:19: Exception raised: @AAA, Unhandled abort encountered in job J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.AAA
12:10:22: EEE (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.EEE) finished, status=2 [Finished with warnings]
12:10:23: DDD (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.DDD) finished, status=2 [Finished with warnings]
12:10:24: BBB (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.BBB) finished, status=2 [Finished with warnings]
12:10:24: Sequence failed (restartable)
Thank You
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Charter Member
- Posts: 64
- Joined: Sat Sep 17, 2005 10:42 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
That's odd, because there's a "started" message in the sequence log and there are more than three minutes between that and the warning event. Could anyone perhaps have tried to recompile the job in this time? Can you please post detail of the "job run requested" event for this invocation?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Charter Member
- Posts: 64
- Joined: Sat Sep 17, 2005 10:42 am
As i am calling the same job with different invocations, there is no change to thee job.
We have set the Auto Purge Job log to 6 runs at the Project level in Administrator, i tried changing this option to 21 and still having the same issue.
The other strange thing that i have noticed is that, when the jobs were invoked from the Job Sequencer,i see 5 instances running and at the end one invocation is disappearing sometimes and for that invocation of the job, sequencer was reported with the status 99. This is not happening all the time.
I tried running the job with five invocations with out using the job sequencer, all are fininsing properly(each instance takes roughly 3 minutes to complete).
I have no idea what is happening.
We have set the Auto Purge Job log to 6 runs at the Project level in Administrator, i tried changing this option to 21 and still having the same issue.
The other strange thing that i have noticed is that, when the jobs were invoked from the Job Sequencer,i see 5 instances running and at the end one invocation is disappearing sometimes and for that invocation of the job, sequencer was reported with the status 99. This is not happening all the time.
I tried running the job with five invocations with out using the job sequencer, all are fininsing properly(each instance takes roughly 3 minutes to complete).
I have no idea what is happening.
-
- Charter Member
- Posts: 64
- Joined: Sat Sep 17, 2005 10:42 am
I tried the option of purging logs until yesterday for this job, but i am getting the samething. I see the weared scenario that i mentioned earlier and i have captured the screenshots of the director showing that all 5 instances running initially, then the parent job showing that one of the instance returned the status 99 and the last screenshot that show the log only for four instances of the job.
I donot know how to attach them to this topic.
I donot know how to attach them to this topic.
Images cannot be "attached" here. Rather you need to upload them somewhere else and then link them to a post here using the [img] or "image tags". Lots of sites available to do free file sharing / hosting, if you feel the need to show us your screenshots.
Can't tell from what you posted, did you try turning off auto-purge for this job to see if it makes any difference?
Can't tell from what you posted, did you try turning off auto-purge for this job to see if it makes any difference?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
There are some problems with multi-instance jobs in 8.0.1. I currently have a PMR open with IBM.
It is not entirely related to auto purging.
Since mult-instance jobs all share the same RT_LOG and RT_STATUS files, it seems to be some kind of timing issue when multiple instances are hitting these tables concurrently.
Sometimes a job sequence will abort with a status=99 error. Sometimes everything will finish with status OK and no active stages actually executing.
I applied one patch from IBM that was supposed to fix the timing issues related to the status=99 problem, but it has been ineffective.
Mike
It is not entirely related to auto purging.
Since mult-instance jobs all share the same RT_LOG and RT_STATUS files, it seems to be some kind of timing issue when multiple instances are hitting these tables concurrently.
Sometimes a job sequence will abort with a status=99 error. Sometimes everything will finish with status OK and no active stages actually executing.
I applied one patch from IBM that was supposed to fix the timing issues related to the status=99 problem, but it has been ineffective.
Mike
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
If you look in to the 8.0.1 fixpack 2 release notes there are a lot of fixes (more than 300) developed by IBM to resolve the problems in earlier release.
In that release the same issue is mentioned.
Problem is not only purging the log entries but its an issue with auto purge itself. When the jobs are running concurrently and auto purge is active it returns the value 99 intermittently.
@Mike
you should look at the release notes to verify that your patch was the same through eCase number and description.
In that release the same issue is mentioned.
Problem is not only purging the log entries but its an issue with auto purge itself. When the jobs are running concurrently and auto purge is active it returns the value 99 intermittently.
@Mike
you should look at the release notes to verify that your patch was the same through eCase number and description.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.
-
- Charter Member
- Posts: 64
- Joined: Sat Sep 17, 2005 10:42 am
-
- Premium Member
- Posts: 33
- Joined: Wed Jul 23, 2008 7:33 am
- Location: Mechelen, Belgium
- Contact:
I think there's 2 part to this issue. we had both these issues seperately:
- we had issues with jobs that run fine, but still return status 99 to the flow or to the dsjob command that started this. There is a patch for this, but this didn't solve our problem completely (it helped though)
- multi instance jobs: change the autopurging to work on number of days in stead of number of runs. this worked for us
- we had issues with jobs that run fine, but still return status 99 to the flow or to the dsjob command that started this. There is a patch for this, but this didn't solve our problem completely (it helped though)
- multi instance jobs: change the autopurging to work on number of days in stead of number of runs. this worked for us
Re: multiple invocations of a multi instance job failing
Could you please let us know which patch did you apply?
Were they any amongst the ones mentioned below:
patch_JR30015v4_server_aix_8011.tar
patch_JR30015v3_client_windows_8011.zip
Got an issue similar to yours !!!
Were they any amongst the ones mentioned below:
patch_JR30015v4_server_aix_8011.tar
patch_JR30015v3_client_windows_8011.zip
Got an issue similar to yours !!!