Hi DS Gurus,
Even though this has been discussed lot of times, but still I couldn't find exact answer related to my issue.
I have a generic parallel job in IIS 8.0 which takes data from dataset & fastloads it into teradata. As this is generic job we have RCP enabled.
Job design is simple as below:
Dataset --> Column Generator (running in parallel mode and propogate partitioning) --> Transformer stage, adding 3 columns for date etc.(preserve sort order, auto partition)--> Teradata Enterprise stage (round robin)
Its on AIX with 2 server 8 node config.
$ ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
memory(kbytes) unlimited
coredump(blocks) 2097151
nofiles(descriptors) 10240
I can see we have enough space on resource (1300GB) & scratch (247GB) disk.
The input dataset has 47+ Million records with overall dataset size being 12.07GB
My job is failing with heap size error and throws other errors as below:
APT_ParallelSortMergeOperator,0: Unbalanced input from partition 1: 10000 records buffered [parallelsortmerge/parallelsortmerge.C:781]
APT_ParallelSortMergeOperator,0: The current soft limit on the data segment (heap) size (2147483645) is less than the hard limit (2147483647), consider increasing the heap size limit
APT_ParallelSortMergeOperator,0: Fatal Error: Throwing exception: APT_BadAlloc: Heap allocation failed. [error_handling/exception.C:132]
APT_CombinedOperatorController,3: Fatal Error: Unable to allocate communication resources [iomgr/iomgr.C:227]
node_node1: Player 1 terminated unexpectedly. [processmgr/player.C:160]
APT_CombinedOperatorController,1: Fatal Error: Unable to allocate communication resources [iomgr/iomgr.C:227]
On of the suggestion we got is to split the data & try to load or change the job design. I don't think datastage will fail because of this much volume.
and moreover I have the same job running in other environment where it is running successfully with 46.5+ Million records with 11.55GB size of dataset.
Can you please help me with this?
Regards,
Aj
Heap size error in Generic job
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
At first glance it appears that you are using a Sort/Merge collector (on the input to the Teradata stage?) and have wildly different row counts coming from the different partitions. You claim to be using Round Robin, but there's an APT_ParallelSortMergeOperator featuring prominently in the error messages. Perhaps dumping the score will give you a better idea just what is going on.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Re: Heap size error in Generic job
Hello,
We got similar troubles. Seems some heap allocation limitations are due to the setting of the AIX environment variable LDR_CNTRL.
You might check how it is set in your "dsenv" file.
That link can also be useful :
http://www-01.ibm.com/support/docview.w ... wg21411997
We got similar troubles. Seems some heap allocation limitations are due to the setting of the AIX environment variable LDR_CNTRL.
You might check how it is set in your "dsenv" file.
That link can also be useful :
http://www-01.ibm.com/support/docview.w ... wg21411997
Regards,
Daniel
Daniel
Re: Heap size error in Generic job
Even we do have LDR_CTRL set in the job. The value is set to 0x80000000 as suggested on ibm support site.
Ray,
Yes I think round robin used in Teradata stage is putting sort/merg internally by DS while processing. At this moment I can not change this & try running the job as this is in prod...
And definitely same job is running fine on different environment with 46.5+ million recordss.
Ray,
Yes I think round robin used in Teradata stage is putting sort/merg internally by DS while processing. At this moment I can not change this & try running the job as this is in prod...
And definitely same job is running fine on different environment with 46.5+ million recordss.
-
- Participant
- Posts: 42
- Joined: Thu Dec 28, 2006 1:39 am
Re: Heap size error in Generic job
Please check the APT_DEFAULT_TRANSPORT_BLOCK_SIZE = 131072aj wrote:Even we do have LDR_CTRL set in the job. The value is set to 0x80000000 as suggested on ibm support site.
Ray,
Yes I think round robin used in Teradata stage is putting sort/merg internally by DS while processing. At this moment I can not change this & try running the job as this is in prod...
And definitely same job is running fine on different environment with 46.5+ million recordss.
APT_PHYSICAL_DATASET_BLOCK_SIZE = NULL