multiple invocations of a multi instance job failing

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

panchusrao2656
Charter Member
Charter Member
Posts: 64
Joined: Sat Sep 17, 2005 10:42 am

multiple invocations of a multi instance job failing

Post by panchusrao2656 »

We have a multi instance AUDIT job that runs for each and every job and collect the job stats and load them to AUDIT tables. Sometimes the job is failing when couple instances of the job is running with multiple invocations. We got the following error.

Error calling DSRunJob(SEQX_ROUTINE_SAVE_JOB_INFO.J040_CIMS_MPI_SDS_PHYSICIANS_ODS_O_CUST_ID_test.CCC_sAUDIT_SK), code=-2 [Job is not in the right state (compiled and not running)]


To test the parallel invocation of a multi-instance job,I have created a test Job Sequencer which calls a multi instance job with five different invocations. Some times job is completing successfully where as sometimes the job is failing with one of the INVOCATION CALL is failing with a job status 99.

Do we need to set something at the PROJECT level to support multi instance jobs.Please share if you have faced this issue eariler.

SEQX_PARALLEL_AUDIT_test321..JobControl (@Coordinator): Summary of sequence run
12:06:43: Sequence started (checkpointing on)
12:06:43: EEE (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.EEE) started
12:06:45: DDD (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.DDD) started
12:06:48: CCC (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.CCC) started
12:06:50: BBB (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.BBB) started
12:06:52: AAA (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.AAA) started
12:10:17: CCC (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.CCC) finished, status=2 [Finished with warnings]
12:10:18: AAA (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.AAA) finished, status=99 [Not running]
12:10:19: Exception raised: @AAA, Unhandled abort encountered in job J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.AAA
12:10:22: EEE (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.EEE) finished, status=2 [Finished with warnings]
12:10:23: DDD (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.DDD) finished, status=2 [Finished with warnings]
12:10:24: BBB (JOB J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.BBB) finished, status=2 [Finished with warnings]
12:10:24: Sequence failed (restartable)

Thank You
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What does the job log of job J100_ODS_CONFORMANCE_SMS_CUST_ADDR_TLPHN_4654_SP_test123.AAA reveal?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
panchusrao2656
Charter Member
Charter Member
Posts: 64
Joined: Sat Sep 17, 2005 10:42 am

Post by panchusrao2656 »

Job invocation itself is failing and i cannot see the log in the director.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's odd, because there's a "started" message in the sequence log and there are more than three minutes between that and the warning event. Could anyone perhaps have tried to recompile the job in this time? Can you please post detail of the "job run requested" event for this invocation?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
panchusrao2656
Charter Member
Charter Member
Posts: 64
Joined: Sat Sep 17, 2005 10:42 am

Post by panchusrao2656 »

As i am calling the same job with different invocations, there is no change to thee job.

We have set the Auto Purge Job log to 6 runs at the Project level in Administrator, i tried changing this option to 21 and still having the same issue.

The other strange thing that i have noticed is that, when the jobs were invoked from the Job Sequencer,i see 5 instances running and at the end one invocation is disappearing sometimes and for that invocation of the job, sequencer was reported with the status 99. This is not happening all the time.

I tried running the job with five invocations with out using the job sequencer, all are fininsing properly(each instance takes roughly 3 minutes to complete).

I have no idea what is happening.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Isn't there some sort of odd MI bug that enabling auto-purge of the logs creates that people have reported? What happens if you disable auto-purge for this job? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
panchusrao2656
Charter Member
Charter Member
Posts: 64
Joined: Sat Sep 17, 2005 10:42 am

Post by panchusrao2656 »

I tried the option of purging logs until yesterday for this job, but i am getting the samething. I see the weared scenario that i mentioned earlier and i have captured the screenshots of the director showing that all 5 instances running initially, then the parent job showing that one of the instance returned the status 99 and the last screenshot that show the log only for four instances of the job.

I donot know how to attach them to this topic.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Images cannot be "attached" here. Rather you need to upload them somewhere else and then link them to a post here using the [img] or "image tags". Lots of sites available to do free file sharing / hosting, if you feel the need to show us your screenshots.

Can't tell from what you posted, did you try turning off auto-purge for this job to see if it makes any difference?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

There are some problems with multi-instance jobs in 8.0.1. I currently have a PMR open with IBM.

It is not entirely related to auto purging.

Since mult-instance jobs all share the same RT_LOG and RT_STATUS files, it seems to be some kind of timing issue when multiple instances are hitting these tables concurrently.

Sometimes a job sequence will abort with a status=99 error. Sometimes everything will finish with status OK and no active stages actually executing.

I applied one patch from IBM that was supposed to fix the timing issues related to the status=99 problem, but it has been ineffective.

Mike
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Since this seems to be a known issue, best to contact your official support provider and see about getting the patch(es) Mike is talking about.
-craig

"You can never have too many knives" -- Logan Nine Fingers
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

If you look in to the 8.0.1 fixpack 2 release notes there are a lot of fixes (more than 300) developed by IBM to resolve the problems in earlier release.

In that release the same issue is mentioned.

Problem is not only purging the log entries but its an issue with auto purge itself. When the jobs are running concurrently and auto purge is active it returns the value 99 intermittently.

@Mike

you should look at the release notes to verify that your patch was the same through eCase number and description.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

From what we've seen here, best to get off of 8.0.1, fix packs or no, and on to 8.1 at your earliest convenience.
-craig

"You can never have too many knives" -- Logan Nine Fingers
panchusrao2656
Charter Member
Charter Member
Posts: 64
Joined: Sat Sep 17, 2005 10:42 am

Post by panchusrao2656 »

Thank you all for sharing your ideas & info. I will request our admin to raise a ticket to IBM to get the patch.
telenet_bi
Premium Member
Premium Member
Posts: 33
Joined: Wed Jul 23, 2008 7:33 am
Location: Mechelen, Belgium
Contact:

Post by telenet_bi »

I think there's 2 part to this issue. we had both these issues seperately:

- we had issues with jobs that run fine, but still return status 99 to the flow or to the dsjob command that started this. There is a patch for this, but this didn't solve our problem completely (it helped though)

- multi instance jobs: change the autopurging to work on number of days in stead of number of runs. this worked for us
Rahul.r.s
Participant
Posts: 1
Joined: Thu Jan 15, 2009 11:01 pm
Location: Mumbai

Re: multiple invocations of a multi instance job failing

Post by Rahul.r.s »

Could you please let us know which patch did you apply?
Were they any amongst the ones mentioned below:
patch_JR30015v4_server_aix_8011.tar
patch_JR30015v3_client_windows_8011.zip

Got an issue similar to yours !!!
Post Reply