Random abnormal termination of jobs

chinek · Post by **chinek** » Tue Feb 08, 2005 4:41 pm

Hi,

We have just recently upgraded to DS 7.5 on a Solaris 2.8 SUN server.
We are having problems with jobs (not any job in particular) aborting with "Abnormal termination of stage P331LoadAssetSourceNEMS00..Transform detected". This can happen to just about any jobs but one observation is that it tends to happen when a high number of jobs are run concurrently. These same jobs will run successully most of the time. The problem can occur in any of the jobs and not limited to one job. Also these jobs are very simple ETL jobs that reads from a source and write out sequential files.

Has anyone else had the same problem ?
Let me know if you need more information.

Nick

Anjan Roy · Post by **Anjan Roy** » Tue Feb 08, 2005 5:00 pm

chinek wrote:Hi,

We have just recently upgraded to DS 7.5 on a Solaris 2.8 SUN server.
We are having problems with jobs (not any job in particular) aborting with "Abnormal termination of stage P331LoadAssetSourceNEMS00..Transform detected". This can happen to just about any jobs but one observation is that it tends to happen when a high number of jobs are run concurrently. These same jobs will run successully most of the time. The problem can occur in any of the jobs and not limited to one job. Also these jobs are very simple ETL jobs that reads from a source and write out sequential files.

Has anyone else had the same problem ?
Let me know if you need more information.

Nick

We have also faced such issue. We have an open ticket with Ascential on this. To fix this at our end, we have introduced a 30 second delay in the shell script that calls the datastage job.

kcbland · Post by **kcbland** » Tue Feb 08, 2005 9:09 pm

Make sure your T30FILES setting is high enough to support the number of jobs executing simultaneously. Your problem is a common one. Search the forum for discussions about the UVCONFIG file and recommended settings. The abnormal terminations can be related to not enough internal pointers available to address all of the open hash files (jobs have log, status, config, and other dynamic hash files open).

chinek · Post by **chinek** » Tue Feb 08, 2005 10:01 pm

Hi
T30FILE is set to 2048 in this server but problem is still occuring. What I have seen is that it seems to happen less often if the jobs are run sequentially from the job control as opposed to running the jobs in parallel through the job control.
But that is not good for us because then some of the batches will simply take too long to complete.

Thanks for your suggestion though.

Nick

kcbland · Post by **kcbland** » Tue Feb 08, 2005 10:13 pm

Release 5.x running on Sun 2.8 had issues that required a Sun patch and a DS patch that was characterized by random abnormal terminations under heavy system load. Release 6+ incorporated the DS side fixes, but the Sun patch I believe is still required. You may consider contacting tech support and verifying that the patch set and kernel parameters on your machine are what they need to be.

ogmios · Post by **ogmios** » Wed Feb 09, 2005 3:09 am

For a leap of faith. Change your dsenv in the following way:

Add "/usr/lib/lpw" in front of the LD_LIBRARY_PATH, restart DS and your problem will magically disappear

So it should be something as
LD_LIBRARY_PATH=/usr/lib/lwp:...

This is a work around for a known thread problem in Solaris/DataStage.

Ogmios

chinek · Post by **chinek** » Wed Feb 09, 2005 4:10 pm

hi

yes adding the /usr/lib/lwp to LD_LIBRARY_PATH seems to have done the trick...

just for the benefit of any one else having this problem , you can just modify the dsenv file and not have to bounce the server process to do this.

Thanks for the help ogmios.

Nick

ogmios · Post by **ogmios** » Wed Feb 09, 2005 4:33 pm

chinek wrote:hi
....

Thanks for the help ogmios.

Nick

Forgot about the not being required to bounce

. By the way the solution is from Ascential, it only took them about half a year to figure that out

Ogmios

netland · Post by **netland** » Thu Feb 10, 2005 4:48 am

Any idea if there is a similar fix for the AIX ?

winterb1 · Post by **winterb1** » Thu Feb 10, 2005 6:50 pm

Anyone know of a fix to these random Abnormal Termination issues for a Win2k box running 7.0.0?

billsklar · Post by **billsklar** » Fri Feb 11, 2005 12:38 pm

Double that on the Windows version. We've been experiencing random terminations for the last 8 months. Last night 3 jobs failed with this error or similar:
jb000PlyrSessionGroupALL6.9.Copy_of_Link_Partitioner_62.ww: ds_ipcopen() - call to OpenFileMapping() failed - The system cannot find the file specified.

cecilia · Post by **cecilia** » Mon Mar 28, 2005 12:31 pm

In my case, Ascential support suggested the change already posted:
LD_LIBRARY_PATH=/usr/lib/lwp:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

It worked fine for a couple of weeks, but as random behavior, it stands out from time to time.
The ticket was reopen.

Regards

PS: Sun Solaris

DSXchange

Random abnormal termination of jobs

Random abnormal termination of jobs

Re: Random abnormal termination of jobs

what about DS6 on aix 5