Page 1 of 1

FATAL Error in transformer

Posted: Mon Jan 31, 2011 5:07 am
by samyamkrishna
TrnsValidJE,0: Fatal Error: Throwing exception: APT_BadAlloc: Heap allocation failed.

APT_TransformOperatorImplV214S0_NonZero0JB_TrnsValidJE in TrnsValidJE], partition 0 of 2, processID 2,478,262 on etlax006_01, player 77 terminated unexpectedly.


What should i be doing?

Posted: Mon Jan 31, 2011 8:26 am
by priyadarshikunal
There are many posts with similar error messages, doesn't anyone solves your issue?

Posted: Tue Feb 01, 2011 7:07 am
by samyamkrishna
Yes i checked out the other posts with teh similar error.
I have set $DSIPC_OPEN_TIMEOUT = 680.

The ulimit shows teh following.

$ ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 4194304
memory(kbytes) unlimited
coredump(blocks) unlimited
nofiles(descriptors) unlimited

The dump score says,
It runs 260 processes on 3 nodes.

My job is really huge.

It has 15 transformers 6 Joins 3 aggregates and two lookups.

Should i break up the job. or is there any other way out of this issue.

Let me know if there is anything else i can do other than splitting the job.

Regards,
Samyam

Posted: Tue Feb 01, 2011 7:30 am
by jwiles
Have you run the job sequentially (with only one node)? When you do, does it abort? Can you reproduce the error using just the transformer which is aborting and some logic around it (maybe a row generator before and a peek after)? Does the transformer which aborts call any parallel routines or perform a lot of function calls/data type conversions? You or a system admin should monitor system resources while the job starts up and executes to see if memory is exhausted by the job. Do this with both one node and three nodes.

Doing the above will help determine whether or not it's the particular transformer itself causing the abort. If the transformer doesn't abort when run alone, or the job runs with a single partition then yes, your best solution is probably splitting the job apart.

My first impression is that the job is overly complex and/or doesn't follow best practices in design. 15 transformers is normally a high number for a single job, and with the joins and aggregators, sorts required for both, and the lookups, you may indeed be overloading the server.

Regards,

Posted: Tue Feb 01, 2011 8:28 am
by samyamkrishna
I executed teh job with a single node configuration.
It aborted againg with the same error.

I executed the transformer alone with a row generator.
works fine.


The tarnsfor which aborts, has a lot of substring functions.
i am reading a complex file using a sequential filestage.

teh complex file has 7 different kinds of records.
I am reading the full record in teh file in a single column and the splitting the records into different columns in this transformer.

another big issue is that.
the target is SAP i am loading SAP using a BAPI.

Now even if i am not loading any data into all teh target SAP columns the columns appear in the tarnsformer. there are 300 target columns even though i am loading only 10 of them the other 290 columns are defaulted to ''.

I will be sitting with my admin tomorrow and see if i can find out anything.

In the mean while is there anything else i can try guys?

Regards,
Samyam

Posted: Wed Feb 02, 2011 2:13 am
by samyamkrishna
split teh job.

Posted: Wed Feb 02, 2011 2:37 am
by ramkumar.krishnan
samyamkrishna wrote:split teh job.

It Seems like System resources reached more than allocated resources.Below are some of the causes of job termination.

# Operating System limit reached maximum of it's capacity. ( No of processes allocated for individual user)
# Scratch disk space is overfilled while the job is running.
# Try with tuning the parameter "APT_DISABLE_COMBINATION".

Thanks
Ramkumar

Posted: Wed Feb 02, 2011 2:53 am
by jwiles
Disabling operator combination will generally increase the number of processes. Normally it's used for debugging failing jobs.

It sounds like splitting the job is your best long-term solution at the moment. I would also investigate having the limit on max number of processes per user increased. For most unix installations, the default is 1024--which doesn't take into account the parallel process nature of DataStage's PX engine.

Also, your server is probably undersized for the amount of work it is now being asked to perform. Since you're running an older version of DataStage, it's likely been around for several years and hasn't been upgraded while workload has probably increased beyond the original design.

Posted: Sun Feb 06, 2011 3:42 am
by ThilSe
It has 15 transformers 6 Joins 3 aggregates and two lookups.
I think you should split the job.