Strange Datastage job behavior!

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Strange Datastage job behavior!

Post by splayer »

I have a sequence job which has several job activity stages and execute command stages. Sometimes, when I execute the sequence job, one of the jobs would just hang. No error messages at all. I have to stop the sequence job. When I reset the jobs and restart, the exact same job runs perfectly fine and immediately without any problems. Has anybody seen this?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Keep your logs in check and perfom cleanup of &PH& directory regularly. It might not be stuck and actually running or taking time to startup, but the status update might be taking time.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

What is &PH& directory?
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

I haven't failed, I've found 10,000 ways that don't work.
Thomas Alva Edison(1847-1931)
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

So is &PH& a directory? How do I know its value?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes, it is a directory - the PHantom directory, a directory leveraged by the background (i.e. 'phantom') processes each job runs as. There's one in each Project directory. What do you mean by "how do I know its value"? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

When I went to the project directory, I saw the following:

D_&COMO&
&COMO&
&SAVEDLISTS&

but I did not see something like &PH&. When I tried to do:
cd &COMO&
-----------------------------------------------------------------------------
I got the following message:
[1] 15295
[2] 15296
myid@MyProject>-bash: cd: cd: No such file or directory
-bash: COMO: command not found

[1]- Exit 1 cd cd
[2]+ Exit 127 COMO
-----------------------------------------------------------------------------

If these are directories and I should clear whatever is in them, shouldn't I be able to go in them?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

splayer - check again, if you don't find a &PH& then perhaps you (or someone else) deleted the directory. Do you have a D_&PH& entry in the project?

You need to execute "cd \&COMO\&' at the command line, as UNIX sees the ampersand as a special character.
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

ArndW, so should I be clearing all of these folders every time before I run my job sequence?
michaeld
Premium Member
Premium Member
Posts: 88
Joined: Tue Apr 04, 2006 8:42 am
Location: Toronto, Canada

Post by michaeld »

I don't have an answer, but I have had this problem before. Lots of times in the windows version. It is usally due to a bug in the job that causes it to use up a lot of memory. However sometime there is no reason for it. I've contacted IBM support about it and they had no answers. Other then clear your log and stuff like that, which was not the problem in my case. I ended up remaking the job (the same way) and it became stable. I think that datastage jobs can become corrupt. I don't know how or why, but it has happed a few times before and usually saving it as another job will fix it.
Mike
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

michaeld, this is not job specific. Exactly the same job works perfectly right next minute. This happens with jobs at random.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Any kind of "hang" usually means waiting for a resource. Next time this occurs check to see if any kind of lock is held on the job sequence or controlled activity that appears to be hanging. Also check to see that there's plenty of free space in the file systems /tmp and those identified by the UVTEMP configuration parameter and those in your parallel job configuration file as resource disk and scratchdisk (particularly the latter).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

So I should do/check for the following:

1) In the &PH& directory, I should delete all DSD.RUN and DSD.StageRun files. I see some DSD.OshMonitor files. Should I delete them as well? Are there any side effects?

2) I go to /tmp folder and delete anything I can in there. I see some files there but they are very small in size.

3) I go to the Scratch folder listed in the configuration file and clear everything in there.

4) In my ResourceDisk directory, I have the binary child files of datasets. Should I go ahead and delete these datasets? I need these datasets further in my process.

5) My UVTEMP directory is defined as /tmp. I have already dealt with it #2 above.

I would appreciate if you could comment on this. Thanks.
Post Reply