FATAL Error in transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

FATAL Error in transformer

Post by samyamkrishna »

TrnsValidJE,0: Fatal Error: Throwing exception: APT_BadAlloc: Heap allocation failed.

APT_TransformOperatorImplV214S0_NonZero0JB_TrnsValidJE in TrnsValidJE], partition 0 of 2, processID 2,478,262 on etlax006_01, player 77 terminated unexpectedly.


What should i be doing?
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

There are many posts with similar error messages, doesn't anyone solves your issue?
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Post by samyamkrishna »

Yes i checked out the other posts with teh similar error.
I have set $DSIPC_OPEN_TIMEOUT = 680.

The ulimit shows teh following.

$ ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 4194304
memory(kbytes) unlimited
coredump(blocks) unlimited
nofiles(descriptors) unlimited

The dump score says,
It runs 260 processes on 3 nodes.

My job is really huge.

It has 15 transformers 6 Joins 3 aggregates and two lookups.

Should i break up the job. or is there any other way out of this issue.

Let me know if there is anything else i can do other than splitting the job.

Regards,
Samyam
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Have you run the job sequentially (with only one node)? When you do, does it abort? Can you reproduce the error using just the transformer which is aborting and some logic around it (maybe a row generator before and a peek after)? Does the transformer which aborts call any parallel routines or perform a lot of function calls/data type conversions? You or a system admin should monitor system resources while the job starts up and executes to see if memory is exhausted by the job. Do this with both one node and three nodes.

Doing the above will help determine whether or not it's the particular transformer itself causing the abort. If the transformer doesn't abort when run alone, or the job runs with a single partition then yes, your best solution is probably splitting the job apart.

My first impression is that the job is overly complex and/or doesn't follow best practices in design. 15 transformers is normally a high number for a single job, and with the joins and aggregators, sorts required for both, and the lookups, you may indeed be overloading the server.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Post by samyamkrishna »

I executed teh job with a single node configuration.
It aborted againg with the same error.

I executed the transformer alone with a row generator.
works fine.


The tarnsfor which aborts, has a lot of substring functions.
i am reading a complex file using a sequential filestage.

teh complex file has 7 different kinds of records.
I am reading the full record in teh file in a single column and the splitting the records into different columns in this transformer.

another big issue is that.
the target is SAP i am loading SAP using a BAPI.

Now even if i am not loading any data into all teh target SAP columns the columns appear in the tarnsformer. there are 300 target columns even though i am loading only 10 of them the other 290 columns are defaulted to ''.

I will be sitting with my admin tomorrow and see if i can find out anything.

In the mean while is there anything else i can try guys?

Regards,
Samyam
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Post by samyamkrishna »

split teh job.
ramkumar.krishnan
Participant
Posts: 3
Joined: Mon Jan 03, 2011 12:09 am
Location: Chennai

Post by ramkumar.krishnan »

samyamkrishna wrote:split teh job.

It Seems like System resources reached more than allocated resources.Below are some of the causes of job termination.

# Operating System limit reached maximum of it's capacity. ( No of processes allocated for individual user)
# Scratch disk space is overfilled while the job is running.
# Try with tuning the parameter "APT_DISABLE_COMBINATION".

Thanks
Ramkumar
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Disabling operator combination will generally increase the number of processes. Normally it's used for debugging failing jobs.

It sounds like splitting the job is your best long-term solution at the moment. I would also investigate having the limit on max number of processes per user increased. For most unix installations, the default is 1024--which doesn't take into account the parallel process nature of DataStage's PX engine.

Also, your server is probably undersized for the amount of work it is now being asked to perform. Since you're running an older version of DataStage, it's likely been around for several years and hasn't been upgraded while workload has probably increased beyond the original design.
- james wiles


All generalizations are false, including this one - Mark Twain.
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Post by ThilSe »

It has 15 transformers 6 Joins 3 aggregates and two lookups.
I think you should split the job.
Post Reply