DSWaitForJob waiting indefinately

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I don't have an answer for you, but just wanted to compliment you on your post. If anyone needed a model for getting help on a problem, they have one now. Excellent. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
katz
Charter Member
Charter Member
Posts: 52
Joined: Thu Jan 20, 2005 8:13 am

Post by katz »

Thanks Craig - its nice to know that my effort to be clear is appreciated. However, when reading your post I've noticed that I've posted in the incorrect forum.

katz
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

D'oh! :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You are correct (at least as far as I know); DSWaitForJob interrogates the RT_STATUSnnn table for the job. This table sometimes does not get updated - which is why killed jobs sometimes appear to retain a "Running" state forever.

However, if the job status shows as "Finished", then one of the active stage records in RT_STATUSnnn may not have been updated. These records are used for the Monitor. Does the Monitor show all stages finished when this problem occurs?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
katz
Charter Member
Charter Member
Posts: 52
Joined: Thu Jan 20, 2005 8:13 am

Post by katz »

Yes, the monitor shows that all the stages have completed.

There have been a couple of cases where the DSWaitForJob executed in an After Job Routine was the one that "randomly" failed to detect when the called job was finished. But the problem has equally occurred in jobs that do not use an after routine, so I don't feel that the issue is related to the routine.

I have recompiled all the jobs, but that has not made any difference.

As I mentioned this problem did not occur before we recently implemented Pluggable Authentication (PAM), which entails executing a uvregen, and although the only difference made in the uvconfig source is setting the parameter value AUTHENICATION 1, I cannot help but wonder if the new UV object has some issue.

Also, I have discovered that the dsepam entry was not created in the pam.conf file, however I can't see any direct relationship between that entry and the symptoms I have (and all users are able to connect without the dsepam entry). Never-the-less I've requested that the dsepam entry be made just so I can rule out that possibility.

Thanks,
katz
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Katz,

I've often implemented a small loop instead of the non-interruptable DSWaitForJob() call. It will issue a call to DSGetJobInfo() to get the status, and if it is still running it will wait a couple of seconds and then try again. That way I can issue a call to DSLogFatal() if I end up waiting too long. Although this will not stop the DSWaitForJob() hang situation, it will let you control how to fail the processes.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The highlighted message means that DSWaitForJob() returns immediately under either of the following two conditions:
  • the job on the job handle has finished

    the job on the job handle has been started again after finishing on the same job handle (that is, without there having been a call to DSDetachJob() function
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
srinagesh
Participant
Posts: 125
Joined: Mon Jul 25, 2005 7:03 am

Post by srinagesh »

Check whether there are any network glitches / system activity at that time.

You can look for these messages in /var/adm/messages
Simplicity is the ultimate sophistication
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Different records in RT_STATUSnnn have different structures. Only the first five are common to all record types. There are records for the job, for each active stage, and for each "resource".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
katz
Charter Member
Charter Member
Posts: 52
Joined: Thu Jan 20, 2005 8:13 am

Post by katz »

The underlying problem with the TZ environment variable was resolved by restarting the cron daemon. The work-around assignment made in dsenv file can now be removed, and there are no more incidents of jobs hanging on the DSWaitForJob.
Post Reply