DS jobs hang

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
jane_jiang
Participant
Posts: 3
Joined: Tue May 04, 2004 1:45 pm

DS jobs hang

Post by jane_jiang »

The issue we had in our production daily ETL load is the DS jobs got hung for some reason, it happened on the different DS job each time. When the job got hung, there is no error message in the job log, and the job status still show like "running". We had to kill(stop the job manually) DS jobs from the DS director, and kill the phantom from the server box. And most of the time, the job was complete successfully in the re-run without any code changes.

We have tried to restart the DS sever, and cleaned up the Project &PH& directory. The "Do not Timeout" is also checked. But no of them seemed work.

Any experiences as above? Can anyone share the possible reasons?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Reasons jobs "stall" at the end of the job:

1. Full job logs automatically purging
2. After-stage or after-job routines going out and doing something
3. After-stage or after-job routines in two jobs locking same object (database table, sequential file, etc)
4. Commit is issued, database doing it's thing
5. &PH& project directory full of lots of files
6. DS server overloaded, takes a long time to catch up to jobs true statuses
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you kill the job processes on the server, they will never get the chance to update their status values (which are read by the Director client) and so will appear to remain in a "running" state.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jane_jiang
Participant
Posts: 3
Joined: Tue May 04, 2004 1:45 pm

Post by jane_jiang »

Thanks for your helpful information.

Kcbland, the job hanging problem we had is the job was hung forever. The job either try to query from the oracle database, or try to do a insert. For the reason 6 you gave me about, do you know how could I find out the DS server is overloaded? Is there a way to monitor it and fix the problem?

Ray, the problem we had is the job was running for too long, and we have to stop the job from the director client, but the phantom was still there on the sever side even the job got stopped. Do you know if there is a better way to kill the phantom?

Do you think this job hanging issue might be related to the "timeout" between the DS sever box and the Oracle server? Is there a variable we I can tune in the uvconfig file?

Again, thanks so much.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

prstat (Solaris 2.8+), iostat, vmstat, netstat are commonly used tools for performance monitoring. Consider downloading top (prstat is a version of it) or glance from HP.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You're more likely to find a timeout to tune in the Oracle client software. There's nothing in uvconfig that would help.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
raj_konig
Participant
Posts: 67
Joined: Thu Dec 22, 2005 12:27 am

Post by raj_konig »

Hi Jane,

With my experience I feel that the query u r using in the job needs some tuning. try running the same query in the database instead of running thru the datastage. This may fix your problem.

If not, then the query/ job must be in waiting state as the table on which your are trying to perform the operation is "locked".

If you face the similiar issue again better not forcebly stop the job, check the target table status, check whether the job got locked or any table it is accessing is locked.

This may fix ur issue.

Thanks,
raj
Post Reply