why this fatal won't abort a job???

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chenxs
Participant
Posts: 30
Joined: Mon Dec 27, 2004 3:11 am

why this fatal won't abort a job???

Post by chenxs »

Dear all,

I meet this fatal in a parallel job, but the job is in finished status like normal... and have not aborted...

aggregate_1,0: Caught exception from runLocally(): APT_BadAlloc: Heap allocation failed..

Why this fatal won't abort the job??

many thanks...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

"Heap allocation failed" means that it couldn't get more memory when it demanded it. So it has spilled to disk, and continued to process, albeit not as fast.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chenxs
Participant
Posts: 30
Joined: Mon Dec 27, 2004 3:11 am

thanks

Post by chenxs »

thanks, ray

but I want this fatal can abort the job, not finished...

how can I do that?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You cannot do that. It is an internal message from the PX engine. It really shouldn't be an error message, but just a warning message.

Why do you want this to abort your job? As Ray has explained, it means that there is no more virtual memory space to hold the data in your aggregation (might you be able to raise your ulimit values?) so PX has begun using disk space to hold temporary data.

You can use PX message handling to demote this message to a warning, or you could pre-sort your data coming into the aggregator stage so that no temporary space is required.
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

hello,

open a case with Ascential.

there are some well known bugs of fatal errors which do not abort a job.
chenxs
Participant
Posts: 30
Joined: Mon Dec 27, 2004 3:11 am

thx

Post by chenxs »

if this warning won't abort the job, the batch will continue...

but the result of this job is wrong (missing some data) ... so we need to abort this job...

and I have pre-sort ...
chenxs
Participant
Posts: 30
Joined: Mon Dec 27, 2004 3:11 am

Post by chenxs »

maybe I need to write a after job routine in order to aborting the job when meet this fatal.......
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Ray and I have been saying that this error message seems to be a non-fatal one - meaning that your data is going to be the same regardless of whether this message is in the log or not.

Your incoming data needs to be sorted upon the key columns for your aggregation. It is not sorted that way, because if it were, the aggregation stage would have no need for interim storage and you wouldn't be getting this error message. If you check your sorting attributes in your job you can make this whole issue a moot point.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You are not losing any data when this message appears. All that is happening is that DataStage is being forced to use disk because there is not enough virtual memory available.

I agree with Arnd that it should not be a Fatal message.

Do check your ulimit settings for the user ID that processes DataStage jobs; you may be able to increase the amount of memory that that user can allocated. Your UNIX admininstrator will need to be involved, because only superuser can increase a ulimit.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

Post by Ultramundane »

You can set ulimit for a shell and inherited shells will have the new ulimit. That is, you could set it in your profile, logout and login, then bounce datastage.

Code: Select all

cat ~/.profile
## DATASTAGE PROFILE
unset ENV

## DATA
ulimit -d 1048576

## MEMORY
ulimit -m 1048576

## NOFILES
ulimit -n 10000

## STACK
ulimit -s 262144

## CORE DUMP SIZE
ulimit -c 4194304

. ~/.dsadm

if [ -s "$MAIL" ]           # This is at Shell startup.  In normal
then echo "$MAILMSG"        # operation, the Shell checks
fi                          # periodically.

## DISPLAY SOME HELP INFO ON LOGIN
. ~/.menu
If osh is running into memory allocation problems (lookup stage will fail when you use over 512 MB on reference link), you can also change osh to use large memory allocation.

On AIX, you would enter the following to change from 512 MB to 2GB.

Code: Select all

/usr/ccs/bin/ldedit -bmaxdata:0x80000000/dsa $APT_ORCHHOME/bin/osh
/usr/ccs/bin/ldedit:  File /Ascential/DataStage/PXEngine/bin/osh updated.
chenxs
Participant
Posts: 30
Joined: Mon Dec 27, 2004 3:11 am

Post by chenxs »

thanks all~

I meet heap allocation failed when /tmp is out of space. And aggregator stoped processing ( so missing some data) then the job finished immediately.

anyway, your suggestion is very useful for me, thanks
Post Reply