Random abnormal termination of jobs

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
chinek
Participant
Posts: 75
Joined: Mon Apr 15, 2002 10:09 pm
Location: Australia

Random abnormal termination of jobs

Post by chinek »

Hi,

We have just recently upgraded to DS 7.5 on a Solaris 2.8 SUN server.
We are having problems with jobs (not any job in particular) aborting with "Abnormal termination of stage P331LoadAssetSourceNEMS00..Transform detected". This can happen to just about any jobs but one observation is that it tends to happen when a high number of jobs are run concurrently. These same jobs will run successully most of the time. The problem can occur in any of the jobs and not limited to one job. Also these jobs are very simple ETL jobs that reads from a source and write out sequential files.

Has anyone else had the same problem ?
Let me know if you need more information.

Nick
Anjan Roy
Participant
Posts: 46
Joined: Mon Apr 12, 2004 9:51 am
Location: USA

Re: Random abnormal termination of jobs

Post by Anjan Roy »

chinek wrote:Hi,

We have just recently upgraded to DS 7.5 on a Solaris 2.8 SUN server.
We are having problems with jobs (not any job in particular) aborting with "Abnormal termination of stage P331LoadAssetSourceNEMS00..Transform detected". This can happen to just about any jobs but one observation is that it tends to happen when a high number of jobs are run concurrently. These same jobs will run successully most of the time. The problem can occur in any of the jobs and not limited to one job. Also these jobs are very simple ETL jobs that reads from a source and write out sequential files.

Has anyone else had the same problem ?
Let me know if you need more information.

Nick
We have also faced such issue. We have an open ticket with Ascential on this. To fix this at our end, we have introduced a 30 second delay in the shell script that calls the datastage job.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Make sure your T30FILES setting is high enough to support the number of jobs executing simultaneously. Your problem is a common one. Search the forum for discussions about the UVCONFIG file and recommended settings. The abnormal terminations can be related to not enough internal pointers available to address all of the open hash files (jobs have log, status, config, and other dynamic hash files open).
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chinek
Participant
Posts: 75
Joined: Mon Apr 15, 2002 10:09 pm
Location: Australia

Post by chinek »

Hi
T30FILE is set to 2048 in this server but problem is still occuring. What I have seen is that it seems to happen less often if the jobs are run sequentially from the job control as opposed to running the jobs in parallel through the job control.
But that is not good for us because then some of the batches will simply take too long to complete.

Thanks for your suggestion though.

Nick
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Release 5.x running on Sun 2.8 had issues that required a Sun patch and a DS patch that was characterized by random abnormal terminations under heavy system load. Release 6+ incorporated the DS side fixes, but the Sun patch I believe is still required. You may consider contacting tech support and verifying that the patch set and kernel parameters on your machine are what they need to be.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

For a leap of faith. Change your dsenv in the following way:

Add "/usr/lib/lpw" in front of the LD_LIBRARY_PATH, restart DS and your problem will magically disappear :wink:

So it should be something as
LD_LIBRARY_PATH=/usr/lib/lwp:...

This is a work around for a known thread problem in Solaris/DataStage.

Ogmios
In theory there's no difference between theory and practice. In practice there is.
chinek
Participant
Posts: 75
Joined: Mon Apr 15, 2002 10:09 pm
Location: Australia

Post by chinek »

hi

yes adding the /usr/lib/lwp to LD_LIBRARY_PATH seems to have done the trick...

just for the benefit of any one else having this problem , you can just modify the dsenv file and not have to bounce the server process to do this.

Thanks for the help ogmios.

Nick
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

chinek wrote:hi
....

Thanks for the help ogmios.

Nick
Forgot about the not being required to bounce :oops: . By the way the solution is from Ascential, it only took them about half a year to figure that out :wink:

Ogmios
In theory there's no difference between theory and practice. In practice there is.
netland
Participant
Posts: 12
Joined: Tue Apr 08, 2003 11:43 pm

what about DS6 on aix 5

Post by netland »

Any idea if there is a similar fix for the AIX ?
winterb1
Participant
Posts: 4
Joined: Tue Mar 16, 2004 2:40 pm

Post by winterb1 »

Anyone know of a fix to these random Abnormal Termination issues for a Win2k box running 7.0.0?
billsklar
Participant
Posts: 17
Joined: Tue Jul 13, 2004 9:42 am

Post by billsklar »

Double that on the Windows version. We've been experiencing random terminations for the last 8 months. Last night 3 jobs failed with this error or similar:
jb000PlyrSessionGroupALL6.9.Copy_of_Link_Partitioner_62.ww: ds_ipcopen() - call to OpenFileMapping() failed - The system cannot find the file specified.
cecilia
Participant
Posts: 33
Joined: Thu Jan 15, 2004 9:55 am
Location: Argentina
Contact:

Post by cecilia »

In my case, Ascential support suggested the change already posted:
LD_LIBRARY_PATH=/usr/lib/lwp:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

It worked fine for a couple of weeks, but as random behavior, it stands out from time to time.
The ticket was reopen.

Regards

PS: Sun Solaris
Post Reply