Sequence job aborted after 'Waiting for job to start'

Palermo · Post by **Palermo** » Sun Jul 02, 2017 3:45 pm

Hi all,

I faced a serious challenge. Sequence jobs have been aborting for the whole week and this happens with different sequence jobs chaotically. Before the jobs worked fine for the last 6 months.

Here is an example of log. As you can see, TRIGGERS_JobSeq (1) ran PRCSSD_TRGGR_IND_SET_Y_PJob (2) but the (1) aborted and (2) finished. Starting time=120 seconds.

After rerunning (1) they both finished successfully:

What was done?
1) DS server was restarted
2) DSWaitStartup and DSWaitStartup were changed to 120 (although the log doesn't show us any errors related to timeout. Why not? if this is the problem.)

Please advise how to fix it? Many thanks, in advance, for your help.

UCDI · Post by **UCDI** » Mon Jul 03, 2017 6:46 am

did anything else change? running more jobs, server OS update, anything like that? How many jobs are running, and how many are allowed (operations console)?

chulett · Post by **chulett** » Mon Jul 03, 2017 7:00 am

Ah yes, the proverbial question - what changed? Obviously something did.

Typically, when I see someone post about seemingly random issues with jobs not starting within the timeout limit but then run fine later, it is almost always an indication of a resource issue on the server. So I have the same questions as UCDI posted...

Palermo · Post by **Palermo** » Mon Jul 03, 2017 10:06 am

UCDI,

At this time 44 jobs are running. OS was not updated. The Workload Management was disabled 1,5 year ago and I am not sure that the following parameters limit a number of running jobs: T30FILE=4096, RLTABSZ=480 (Maximum running jobs=900)

CHulett - I agree with you. Support team opened PMR ticket to monitor and estimate Server resources.

Thanks.

chulett · Post by **chulett** » Wed Jul 05, 2017 7:10 am

Those settings tend to limit the number of running jobs by causing any over the limit to blow up... and throw very specific errors pointing to them as the culprit, from what I recall. They're not the issue.

And realize that the "resource issue" isn't confined to just what DataStage things are running on the server...

Palermo · Post by **Palermo** » Thu Jul 06, 2017 2:00 am

The support team reported that was GSKit. Now the problem was solved.

chulett · Post by **chulett** » Thu Jul 06, 2017 6:49 am

GSKit? Not something that's ever been posted here before, can you (or anyone else) elaborate a bit? Thanks.

JRodriguez · Post by **JRodriguez** » Thu Jul 06, 2017 6:05 pm

Latest versions of IIS use a Global Security kit (GSkit) for both encryption and SSL communication...by default.

It will be nice to find more on the root cause and the solution. I could forsee that if two version of the GSkit got installed in the server could cause issues ( two DS version side by side with itag??, or other IBM products using a different version of the GSkit) ...

Palermo · Post by **Palermo** » Mon Jul 10, 2017 12:57 pm

I don't know the details because I am a developer and I was not involved in solving the problem.