Heap size error in Generic job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
aj
Participant
Posts: 2
Joined: Mon Jul 04, 2005 8:23 am

Heap size error in Generic job

Post by aj »

Hi DS Gurus,

Even though this has been discussed lot of times, but still I couldn't find exact answer related to my issue.

I have a generic parallel job in IIS 8.0 which takes data from dataset & fastloads it into teradata. As this is generic job we have RCP enabled.
Job design is simple as below:
Dataset --> Column Generator (running in parallel mode and propogate partitioning) --> Transformer stage, adding 3 columns for date etc.(preserve sort order, auto partition)--> Teradata Enterprise stage (round robin)

Its on AIX with 2 server 8 node config.
$ ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
memory(kbytes) unlimited
coredump(blocks) 2097151
nofiles(descriptors) 10240

I can see we have enough space on resource (1300GB) & scratch (247GB) disk.

The input dataset has 47+ Million records with overall dataset size being 12.07GB

My job is failing with heap size error and throws other errors as below:
APT_ParallelSortMergeOperator,0: Unbalanced input from partition 1: 10000 records buffered [parallelsortmerge/parallelsortmerge.C:781]

APT_ParallelSortMergeOperator,0: The current soft limit on the data segment (heap) size (2147483645) is less than the hard limit (2147483647), consider increasing the heap size limit

APT_ParallelSortMergeOperator,0: Fatal Error: Throwing exception: APT_BadAlloc: Heap allocation failed. [error_handling/exception.C:132]

APT_CombinedOperatorController,3: Fatal Error: Unable to allocate communication resources [iomgr/iomgr.C:227]
node_node1: Player 1 terminated unexpectedly. [processmgr/player.C:160]
APT_CombinedOperatorController,1: Fatal Error: Unable to allocate communication resources [iomgr/iomgr.C:227]

On of the suggestion we got is to split the data & try to load or change the job design. I don't think datastage will fail because of this much volume.
and moreover I have the same job running in other environment where it is running successfully with 46.5+ Million records with 11.55GB size of dataset.

Can you please help me with this?

Regards,
Aj
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

At first glance it appears that you are using a Sort/Merge collector (on the input to the Teradata stage?) and have wildly different row counts coming from the different partitions. You claim to be using Round Robin, but there's an APT_ParallelSortMergeOperator featuring prominently in the error messages. Perhaps dumping the score will give you a better idea just what is going on.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ghila
Premium Member
Premium Member
Posts: 41
Joined: Mon Mar 15, 2004 2:37 pm
Location: France

Re: Heap size error in Generic job

Post by ghila »

Hello,

We got similar troubles. Seems some heap allocation limitations are due to the setting of the AIX environment variable LDR_CNTRL.
You might check how it is set in your "dsenv" file.

That link can also be useful :
http://www-01.ibm.com/support/docview.w ... wg21411997
Regards,

Daniel
aj
Participant
Posts: 2
Joined: Mon Jul 04, 2005 8:23 am

Re: Heap size error in Generic job

Post by aj »

Even we do have LDR_CTRL set in the job. The value is set to 0x80000000 as suggested on ibm support site.

Ray,
Yes I think round robin used in Teradata stage is putting sort/merg internally by DS while processing. At this moment I can not change this & try running the job as this is in prod...
And definitely same job is running fine on different environment with 46.5+ million recordss.
prasanna_anbu
Participant
Posts: 42
Joined: Thu Dec 28, 2006 1:39 am

Re: Heap size error in Generic job

Post by prasanna_anbu »

aj wrote:Even we do have LDR_CTRL set in the job. The value is set to 0x80000000 as suggested on ibm support site.

Ray,
Yes I think round robin used in Teradata stage is putting sort/merg internally by DS while processing. At this moment I can not change this & try running the job as this is in prod...
And definitely same job is running fine on different environment with 46.5+ million recordss.
Please check the APT_DEFAULT_TRANSPORT_BLOCK_SIZE = 131072
APT_PHYSICAL_DATASET_BLOCK_SIZE = NULL
Post Reply