Page 1 of 2

MultiInstance Sequncer failing with Error code 255

Posted: Wed Jul 22, 2015 8:54 am
by neeraj
Hello,

I am facing an issue with the multi-instance Sequncer. This sequncer calls a job which connects with Database and creatia a file.

At one time, we have 4 instances running in parallel. Most of times there is no issue and all the instances are executed successfully. But sometime, the sequncer fails with an error 255.

The command used to invoke the sequncer is as below:-
*****************************************************

The command is : /opt/IBM/InformationServer/Server/DSEngine/bin/dsjob -secfile /opt/IBM/InformationServer/Server/DSEngine/secfile -run -wait -jobstatus -param business_date=2015-07-08 -param batch_id=1011 -param Jp_batchrun_id=2015070822391 -param -param Jp_invoc_id=ACE101120150708223915 Project_test Sq_File.ACE101120150708223915
*****************************************************
The Parameter File Status is 255
*****************************************************

When we rerun it again, it executes successfully.

I tried to google and Dsxchange site but did not find any valuable input.

Request you to please let me know do we need to re-configure some settings on server.

Regards
Neeraj

Posted: Wed Jul 22, 2015 9:16 am
by chulett
This part concerns me:

-param Jp_batchrun_id=2015070822391 -param -param Jp_invoc_id=ACE101120150708223915

The bare param in your command line. I'd start by fixing that and see if it helps. Also, can you post the full, unedited errors you get when this fails?

Posted: Wed Jul 22, 2015 9:48 am
by neeraj
Hello,

I am sorry for putting the wrong statement

The Shell script command is

Prcss_Strt_Ts=$(date +%Y%m%d%H%M%S)
DSHOME=`cat /.dshome`
. $DSHOME/dsenv

ds_command=`echo "$DSHOME/bin/dsjob -secfile $DSHOME/secfile -run -wait -jobstatus -param business_date=$Bus_Dt -param batch_id=$Pgm_Batch_id -param Jp_batchrun_id=$Pr
css_Strt_Ts -param Jp_invoc_id=$Src_Name$Pgm_Batch_id$Prcss_Strt_Ts $Project_Name Sq_File.$Src_Name$Pgm_Batch_id$Prcss_Strt_Ts"`
echo "The command is :" $ds_command >>$LogFile

The output is
************The Parameter file creation process Started*************
The command is : /opt/IBM/InformationServer/Server/DSEngine/bin/dsjob -secfile /opt/IBM/InformationServer/Server/DSEngine/secfile -run -wait -jobstatus -param business_date=2015-07-08 -param batch_id=1011 -param Jp_batchrun_id=2015070822391 -param Jp_invoc_id=ACE101120150708223915 Project_test Sq_File.ACE101120150708223915

The Parameter File Status is 255

**************************************************
STATUS REPORT FOR JOB: Px_Jb_File.ACE101120150708223915
Generated: 2015-07-08 22:39:51
Job start time=2015-05-20 18:30:39
Job end time=2015-07-08 22:39:51
Job elapsed time=1180:09:12
Job status=99 (Not running)The Parameter file Job has Failed.


Px_Jb_File is the Multi Instance Datastage Job which is invoked by Sequncer.
When we checked on the Director log, there is no such reference avalable.

Posted: Wed Jul 22, 2015 10:30 am
by chulett
Where are the actual errors from the sequence job's log? That is what I meant, not just what your script reports.

Posted: Wed Jul 22, 2015 11:49 am
by neeraj
The output is
************The Parameter file creation process Started*************
The command is : /opt/IBM/InformationServer/Server/DSEngine/bin/dsjob -secfile /opt/IBM/InformationServer/Server/DSEngine/secfile -run -wait -jobstatus -param business_date=2015-07-08 -param batch_id=1011 -param Jp_batchrun_id=2015070822391 -param Jp_invoc_id=ACE101120150708223915 Project_test Sq_File.ACE101120150708223915

The Parameter File Status is 255

**************************************************
STATUS REPORT FOR JOB: Px_Jb_File.ACE101120150708223915
Generated: 2015-07-08 22:39:51
Job start time=2015-05-20 18:30:39
Job end time=2015-07-08 22:39:51
Job elapsed time=1180:09:12
Job status=99 (Not running)The Parameter file Job has Failed.

Posted: Wed Jul 22, 2015 11:50 am
by neeraj
This is what I got in the log file

Posted: Wed Jul 22, 2015 12:10 pm
by chulett
Okay... again, not interested so much in your log file. Looking for the log entries from the job itself, i.e. from the Director client. Not from your script.

Only other thing to note right now is if it fails and you do nothing other than run it again later and it succeeds then you have a resource issue or contention. Perhaps two things are running at the same time that shouldn't be or you have "too many" things running at the same time.

Posted: Wed Jul 22, 2015 12:49 pm
by neeraj
The Job is not executed how can we have Director log for the Job.

Posted: Wed Jul 22, 2015 1:43 pm
by chulett
:?

Your post noted that it failed, not that it failed to run. They are not the same thing, hence my questions. So... which way did it go?

1. The Sequence job starts and fails to to start something inside it, which fails the Sequence?
2. The Sequence job starts and runs something inside it, which fails and thus fails the Seqeunce?
3. The Sequence job itself fails to start?
4. Something else entirely?

Just want to make sure we understand the issue.

Posted: Wed Jul 22, 2015 2:17 pm
by neeraj
The Sequencer itself fails to start

Posted: Wed Jul 22, 2015 3:35 pm
by ray.wurlod
Curious that it failed to create the security file. The dsjob command, if I understand correctly, expects the security file to exist when this option is given.

Posted: Wed Jul 22, 2015 9:17 pm
by chulett
Any chance of when two invocations of the job go after the same "security file" at the same time, one would block the other?

Posted: Thu Jul 23, 2015 3:04 am
by ShaneMuir
Is it possible that your sequence is generating the same invocation id for multiple jobs?

Posted: Mon Aug 10, 2015 11:24 am
by neeraj
Hi,

You are right. There are possibilities that 2 different instances trying to access the Authfile/Security file at same time as the multiple instances are running in Parallel.

But this is how the job is designed i.e. to run in parallel.

How can we handle such scenerio.

Regards
Neeraj

Posted: Mon Aug 10, 2015 12:09 pm
by priyadarshikunal
neeraj wrote:**************************************************
STATUS REPORT FOR JOB: Px_Jb_File.ACE101120150708223915
Generated: 2015-07-08 22:39:51
Job start time=2015-05-20 18:30:39
Job end time=2015-07-08 22:39:51
Job elapsed time=1180:09:12
Job status=99 (Not running)The Parameter file Job has Failed.

Well, this report concerns me more. Seems when you are running the job, its failing due to other reason and when it fails the script is doing exit 255. Do check the job and script logs in detail.