Page 1 of 1

Random abnormal termination of jobs

Posted: Tue Feb 08, 2005 4:41 pm
by chinek
Hi,

We have just recently upgraded to DS 7.5 on a Solaris 2.8 SUN server.
We are having problems with jobs (not any job in particular) aborting with "Abnormal termination of stage P331LoadAssetSourceNEMS00..Transform detected". This can happen to just about any jobs but one observation is that it tends to happen when a high number of jobs are run concurrently. These same jobs will run successully most of the time. The problem can occur in any of the jobs and not limited to one job. Also these jobs are very simple ETL jobs that reads from a source and write out sequential files.

Has anyone else had the same problem ?
Let me know if you need more information.

Nick

Re: Random abnormal termination of jobs

Posted: Tue Feb 08, 2005 5:00 pm
by Anjan Roy
chinek wrote:Hi,

We have just recently upgraded to DS 7.5 on a Solaris 2.8 SUN server.
We are having problems with jobs (not any job in particular) aborting with "Abnormal termination of stage P331LoadAssetSourceNEMS00..Transform detected". This can happen to just about any jobs but one observation is that it tends to happen when a high number of jobs are run concurrently. These same jobs will run successully most of the time. The problem can occur in any of the jobs and not limited to one job. Also these jobs are very simple ETL jobs that reads from a source and write out sequential files.

Has anyone else had the same problem ?
Let me know if you need more information.

Nick
We have also faced such issue. We have an open ticket with Ascential on this. To fix this at our end, we have introduced a 30 second delay in the shell script that calls the datastage job.

Posted: Tue Feb 08, 2005 9:09 pm
by kcbland
Make sure your T30FILES setting is high enough to support the number of jobs executing simultaneously. Your problem is a common one. Search the forum for discussions about the UVCONFIG file and recommended settings. The abnormal terminations can be related to not enough internal pointers available to address all of the open hash files (jobs have log, status, config, and other dynamic hash files open).

Posted: Tue Feb 08, 2005 10:01 pm
by chinek
Hi
T30FILE is set to 2048 in this server but problem is still occuring. What I have seen is that it seems to happen less often if the jobs are run sequentially from the job control as opposed to running the jobs in parallel through the job control.
But that is not good for us because then some of the batches will simply take too long to complete.

Thanks for your suggestion though.

Nick

Posted: Tue Feb 08, 2005 10:13 pm
by kcbland
Release 5.x running on Sun 2.8 had issues that required a Sun patch and a DS patch that was characterized by random abnormal terminations under heavy system load. Release 6+ incorporated the DS side fixes, but the Sun patch I believe is still required. You may consider contacting tech support and verifying that the patch set and kernel parameters on your machine are what they need to be.

Posted: Wed Feb 09, 2005 3:09 am
by ogmios
For a leap of faith. Change your dsenv in the following way:

Add "/usr/lib/lpw" in front of the LD_LIBRARY_PATH, restart DS and your problem will magically disappear :wink:

So it should be something as
LD_LIBRARY_PATH=/usr/lib/lwp:...

This is a work around for a known thread problem in Solaris/DataStage.

Ogmios

Posted: Wed Feb 09, 2005 4:10 pm
by chinek
hi

yes adding the /usr/lib/lwp to LD_LIBRARY_PATH seems to have done the trick...

just for the benefit of any one else having this problem , you can just modify the dsenv file and not have to bounce the server process to do this.

Thanks for the help ogmios.

Nick

Posted: Wed Feb 09, 2005 4:33 pm
by ogmios
chinek wrote:hi
....

Thanks for the help ogmios.

Nick
Forgot about the not being required to bounce :oops: . By the way the solution is from Ascential, it only took them about half a year to figure that out :wink:

Ogmios

what about DS6 on aix 5

Posted: Thu Feb 10, 2005 4:48 am
by netland
Any idea if there is a similar fix for the AIX ?

Posted: Thu Feb 10, 2005 6:50 pm
by winterb1
Anyone know of a fix to these random Abnormal Termination issues for a Win2k box running 7.0.0?

Posted: Fri Feb 11, 2005 12:38 pm
by billsklar
Double that on the Windows version. We've been experiencing random terminations for the last 8 months. Last night 3 jobs failed with this error or similar:
jb000PlyrSessionGroupALL6.9.Copy_of_Link_Partitioner_62.ww: ds_ipcopen() - call to OpenFileMapping() failed - The system cannot find the file specified.

Posted: Mon Mar 28, 2005 12:31 pm
by cecilia
In my case, Ascential support suggested the change already posted:
LD_LIBRARY_PATH=/usr/lib/lwp:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

It worked fine for a couple of weeks, but as random behavior, it stands out from time to time.
The ticket was reopen.

Regards

PS: Sun Solaris