Page 1 of 1

Heap Size errror in Transformer

Posted: Fri Feb 29, 2008 12:45 pm
by just4u_sharath
I have run my job to extract 9Million records and my job aborted indicating the below error. In this i cannot understand what is heap size.
What is heap size in transformer and how can i overcome this fatal error.

APT_CombinedOperatorController,0: The current soft limit on the data segment (heap) size (805306368) is less than the hard limit (2147483647), consider increasing the heap size limit

APT_CombinedOperatorController,0: Current heap size: 671491008 bytes in 16654829 blocks

Xfm_FsdbLayout,0: Failure during execution of operator logic.

Xfm_FsdbLayout,0: Input 0 consumed 8320622 records.

Xfm_FsdbLayout,0: Output 0 produced 8320621 records.

APT_CombinedOperatorController,0: Fatal Error: Throwing exception: APT_BadAlloc: Heap allocation failed.

node_PFS1_TST-node1: Player 20 terminated unexpectedly.

Posted: Fri Feb 29, 2008 12:54 pm
by kumar_s
You have a Aggregator in your Job.
You have enabled the Combinable operators. So the error i thrown out in Transformer.
Change the Aggregation method to Sort in Aggregator.

Posted: Fri Feb 29, 2008 3:43 pm
by ray.wurlod
The error is actually thrown by APT_CombinedOperatorController - you have no idea which stage is to blame. That's why we rarely attempt to diagnose errors from this operator. It may even be that the heap problems come from an inserted tsort operator.

Posted: Fri Feb 29, 2008 7:54 pm
by just4u_sharath
ray.wurlod wrote:The error is actually thrown by APT_CombinedOperatorController - you have no idea which stage is to blame. That's why we rarely attempt to diagnose errors from this operator. It may even be that the heap problems come from an inserted tsort operator.
I am a novice at this kinds of errors.
So please can you let me know how can i debug these kinds of errors and find in which stage the actual error is. And there is no aggregator stage in my job but certainly sort stages are there. so may be tsort operator failure.

Posted: Fri Feb 29, 2008 9:28 pm
by kumar_s
Ray made the right guess (judgment)!!
Set the Combinable mode to false in the Transformer's properteis. You ll get the error in the right stages!

Posted: Sun Mar 02, 2008 11:56 pm
by just4u_sharath
kumar_s wrote:Ray made the right guess (judgment)!!
Set the Combinable mode to false in the Transformer's properteis. You ll get the error in the right stages!
In my job there is a lookup stage looking up datasets and a transformer outputted to datasets and funnel stage. No sorting. No aggregator. But the still the problem persists. Is this error is due to my logic in transformer or because of space issues. Replies will be appreciated.

Posted: Mon Mar 03, 2008 12:33 am
by ray.wurlod
Disable operator combination so that you can find out where the error is occurring. Help us to help you. We simply can not (or at least will not) diagnose errors thrown by an arbitrary number of stages combined into the one process.

Posted: Mon Mar 03, 2008 1:48 pm
by just4u_sharath
ray.wurlod wrote:Disable operator combination so that you can find out where the error is occurring. Help us to help you. We simply can not (or at least will not) diagnose errors thrown by an arbitrary number of stages combined into the one process.
I did as you said. I have disabled the combinality property in all stages of that job. Still the error is shown in Transformer. I cannot figure out where this heap size taking place in transformer. No sorting in transformer. I mean Sort order is not preserved.

Posted: Mon Mar 03, 2008 3:41 pm
by ray.wurlod
Capture and inspect the score - the script that is actually executed, rather than the generated osh.

Posted: Mon Mar 03, 2008 8:19 pm
by just4u_sharath
ray.wurlod wrote:Capture and inspect the score - the script that is actually executed, rather than the generated osh.
I have used dump_score, Pm_player_memory, pm_player_timing.
Dumpscore says 44 process runs on 2 nodes.
Heap size was good at other stages. but when it came to transformer, th initial info is

Xfm_FsdbLayout,0: The current soft limit on the data segment (heap) size (805306368) is less than the hard limit (2147483647), consider increasing the heap size limit.

Before this the Info is Heap growth during RunLocally(() o bytes.
After the 1st info fatal error is

Xfm_FsdbLayout,0: failure during operator logic.

Still cannot find the real problem.

Posted: Mon Mar 03, 2008 9:07 pm
by ray.wurlod
Then please post the actual errors (not the ones from APT_CombinedOperatorController).

The score would be used to identify whether DataStage had inserted any tsort or buffer operators. Had it?

Posted: Tue Mar 04, 2008 11:56 am
by just4u_sharath
ray.wurlod wrote:Then please post the actual errors (not the ones from APT_CombinedOperatorController).

The score would be used to identify whether DataStage had inserted any tsort or buffer operators. Had it?
Thanks for your reply
Below i have pasted the Info and error in the order as shown in director.
Also i have pasted the whole Dump score.
I can there are some buffers coming in between (from dumpscre).

Info: funnel,1: Heap growth during runLocally(): 0 bytes

INfo: Lkup_PsCodes,1: When binding input interface field "PRODUCT_TYPE_NAME" to field "PRODUCT_TYPE_NAME": Implicit conversion from source type "string[max=40]" to result type "string[max=30]": Possible truncation of variable length string.

Info: buffer(0),1: Operator completed. status: APT_StatusOk elapsed: 2996.60 user: 10.70 sys: 5.05 (total CPU: 15.75)

Info: buffer(0),1: Heap growth during runLocally(): 0 bytes
Xfm_FsdbLayout,0: The current soft limit on the data segment (heap) size (805306368) is less than the hard limit (2147483647), consider increasing the heap size limit

Info: Xfm_FsdbLayout,0: Current heap size: 671354528 bytes in 16671885 blocks

Error: Xfm_FsdbLayout,0: Failure during execution of operator logic.

Info: Xfm_FsdbLayout,0: Input 0 consumed 8331251 records.

Info: Xfm_FsdbLayout,0: Output 0 produced 8331250 records.
Output 1 produced 0 records.
Output 2 produced 0 records.
Output 3 produced 0 records.
Output 4 produced 11213 records.
Output 5 produced 0 records.

error:Xfm_FsdbLayout,0: Fatal Error: Throwing exception: APT_BadAlloc: Heap allocation failed.

error: node_PFS1_TST-node1: Player 11 terminated unexpectedly.


Below is the dumpscore

main_program: This step has 40 datasets:
ds0: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_FDSPreppedEqmn1.ds
eAny=>eCollectAny
op0[2p] (parallel dsFDSPreppedEqmn1)}
ds1: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_FDSPreppedNeqmn1.ds
[pp] eSame=>eCollectAny
op1[2p] (parallel dsFDSPreppedNeqmn1)}
ds2: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_MktSegCdCap.ds
eAny=>eCollectAny
op3[2p] (parallel MktSegCd)}
ds3: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_SourceCap.ds
eAny=>eCollectAny
op4[2p] (parallel Source)}
ds4: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_DataTypCdCap.ds
eAny=>eCollectAny
op5[2p] (parallel DataTypCd)}
ds5: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_DstrbStsCdCap.ds
eAny=>eCollectAny
op6[2p] (parallel Dstrb_Sts_Cd)}
ds6: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_UnetPrdctCdCap.ds
eAny=>eCollectAny
op7[2p] (parallel Unet_Prdct_Cd)}
ds7: {op0[2p] (parallel dsFDSPreppedEqmn1)
eSame=>eCollectAny
op2[2p] (parallel funnel)}
ds8: {op1[2p] (parallel dsFDSPreppedNeqmn1)
[pp] eSame=>eCollectAny
op2[2p] (parallel funnel)}
ds9: {op2[2p] (parallel funnel)
[pp] eSame=>eCollectAny
op9[2p] (parallel buffer(0))}
ds10: {op3[2p] (parallel MktSegCd)
eEntire>>eCollectAny
op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)}
ds11: {op4[2p] (parallel Source)
eEntire>>eCollectAny
op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)}
ds12: {op5[2p] (parallel DataTypCd)
eEntire>>eCollectAny
op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)}
ds13: {op6[2p] (parallel Dstrb_Sts_Cd)
eEntire>>eCollectAny
op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)}
ds14: {op7[2p] (parallel Unet_Prdct_Cd)
eEntire>>eCollectAny
op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)}
ds15: {op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)
eEntire<>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds16: {op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)
eAny<>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds17: {op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)
eAny<>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds18: {op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)
eAny<>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds19: {op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)
eAny<>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds20: {op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)
eAny<>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds21: {op9[2p] (parallel buffer(0))
[pp] eSame=>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds22: {op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)
[pp] eSame=>eCollectAny
op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)}
ds23: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op12[2p] (parallel buffer(1))}
ds24: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op13[2p] (parallel buffer(2))}
ds25: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op14[2p] (parallel buffer(3))}
ds26: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op15[2p] (parallel dsFDS020PreppedFsdblayout)}
ds27: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op16[2p] (parallel buffer(4))}
ds28: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op17[2p] (parallel buffer(5))}
ds29: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op18[2p] (parallel buffer(6))}
ds30: {op12[2p] (parallel buffer(1))
[pp] eSame=>eCollectAny
op19[2p] (parallel funnel_codeset_errors)}
ds31: {op13[2p] (parallel buffer(2))
[pp] eSame=>eCollectAny
op19[2p] (parallel funnel_codeset_errors)}
ds32: {op14[2p] (parallel buffer(3))
[pp] eSame=>eCollectAny
op19[2p] (parallel funnel_codeset_errors)}
ds33: {op16[2p] (parallel buffer(4))
[pp] eSame=>eCollectAny
op19[2p] (parallel funnel_codeset_errors)}
ds34: {op17[2p] (parallel buffer(5))
[pp] eSame=>eCollectAny
op19[2p] (parallel funnel_codeset_errors)}
ds35: {op18[2p] (parallel buffer(6))
[pp] eSame=>eCollectAny
op19[2p] (parallel funnel_codeset_errors)}
ds36: {op19[2p] (parallel funnel_codeset_errors)
eOther(APT_HashPartitioner { key={ value=ERR_SEQ_NBR_1,
subArgs={ asc }
},
key={ value=ERR_FLD_NM_1,
subArgs={ cs, asc }
},
key={ value=ERR_FLD_VAL_1,
subArgs={ cs, asc }
}
})#>eCollectAny
op20[2p] (parallel err_codeset.to_err_codeset_Sort)}
ds37: {op20[2p] (parallel err_codeset.to_err_codeset_Sort)
eSame=>eCollectAny
op21[2p] (parallel buffer(7))}
ds38: {op21[2p] (parallel buffer(7))
>>eCollectOther(APT_SortedMergeCollector { key={ value=ERR_SEQ_NBR_1,
subArgs={ asc }
},
key={ value=ERR_FLD_NM_1,
subArgs={ cs, asc }
},
key={ value=ERR_FLD_VAL_1,
subArgs={ cs, asc }
}
})
op22[1p] (sequential APT_RealFileExportOperator in err_codeset)}
ds39: {op15[2p] (parallel dsFDS020PreppedFsdblayout)
[pp] =>
/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_FDSPreppedFsdbLayout.ds}
It has 23 operators:
op0[2p] {(parallel dsFDSPreppedEqmn1)
on nodes (
PFS1_TST-node1[op0,p0]
PFS1_TST-node2[op0,p1]
)}
op1[2p] {(parallel dsFDSPreppedNeqmn1)
on nodes (
PFS1_TST-node1[op1,p0]
PFS1_TST-node2[op1,p1]
)}
op2[2p] {(parallel funnel)
on nodes (
PFS1_TST-node1[op2,p0]
PFS1_TST-node2[op2,p1]
)}
op3[2p] {(parallel MktSegCd)
on nodes (
PFS1_TST-node1[op3,p0]
PFS1_TST-node2[op3,p1]
)}
op4[2p] {(parallel Source)
on nodes (
PFS1_TST-node1[op4,p0]
PFS1_TST-node2[op4,p1]
)}
op5[2p] {(parallel DataTypCd)
on nodes (
PFS1_TST-node1[op5,p0]
PFS1_TST-node2[op5,p1]
)}
op6[2p] {(parallel Dstrb_Sts_Cd)
on nodes (
PFS1_TST-node1[op6,p0]
PFS1_TST-node2[op6,p1]
)}
op7[2p] {(parallel Unet_Prdct_Cd)
on nodes (
PFS1_TST-node1[op7,p0]
PFS1_TST-node2[op7,p1]
)}
op8[1p] {(parallel APT_LUTCreateOp in Lkup_PsCodes)
on nodes (
PFS1_TST-node1[op8,p0]
)}
op9[2p] {(parallel buffer(0))
on nodes (
PFS1_TST-node1[op9,p0]
PFS1_TST-node2[op9,p1]
)}
op10[2p] {(parallel APT_LUTProcessOp in Lkup_PsCodes)
on nodes (
PFS1_TST-node1[op10,p0]
PFS1_TST-node2[op10,p1]
)}
op11[2p] {(parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
on nodes (
PFS1_TST-node1[op11,p0]
PFS1_TST-node2[op11,p1]
)}
op12[2p] {(parallel buffer(1))
on nodes (
PFS1_TST-node1[op12,p0]
PFS1_TST-node2[op12,p1]
)}
op13[2p] {(parallel buffer(2))
on nodes (
PFS1_TST-node1[op13,p0]
PFS1_TST-node2[op13,p1]
)}
op14[2p] {(parallel buffer(3))
on nodes (
PFS1_TST-node1[op14,p0]
PFS1_TST-node2[op14,p1]
)}
op15[2p] {(parallel dsFDS020PreppedFsdblayout)
on nodes (
PFS1_TST-node1[op15,p0]
PFS1_TST-node2[op15,p1]
)}
op16[2p] {(parallel buffer(4))
on nodes (
PFS1_TST-node1[op16,p0]
PFS1_TST-node2[op16,p1]
)}
op17[2p] {(parallel buffer(5))
on nodes (
PFS1_TST-node1[op17,p0]
PFS1_TST-node2[op17,p1]
)}
op18[2p] {(parallel buffer(6))
on nodes (
PFS1_TST-node1[op18,p0]
PFS1_TST-node2[op18,p1]
)}
op19[2p] {(parallel funnel_codeset_errors)
on nodes (
PFS1_TST-node1[op19,p0]
PFS1_TST-node2[op19,p1]
)}
op20[2p] {(parallel err_codeset.to_err_codeset_Sort)
on nodes (
PFS1_TST-node1[op20,p0]
PFS1_TST-node2[op20,p1]
)}
op21[2p] {(parallel buffer(7))
on nodes (
PFS1_TST-node1[op21,p0]
PFS1_TST-node2[op21,p1]
)}
op22[1p] {(sequential APT_RealFileExportOperator in err_codeset)
on nodes (
PFS1_TST-node2[op22,p0]
)}
It runs 44 processes on 2 nodes.

Posted: Wed Mar 05, 2008 9:03 pm
by kumar_s
Try to change the field PRODUCT_TYPE_NAME to varchar / Char 40 or use Substring / trim function to reduce it to 30.
How many number of rows are getting processed?
It could be due to the overhead of the auto truncation in link buffer memory.
what is the desing of the job?
Are you using Sort merge collector for any stage? Try changing that as well to see if the job skips the error during run.

Posted: Wed Mar 05, 2008 10:14 pm
by ray.wurlod
There are certainly a lot of Data Sets associated with the Transformer stage. This may explain why demand for memory is so high.

It will take some time to review the score and construct the actual job that runs.

Please be patient, or attempt it yourself. On paper create a "link" for each data set and a "stage" for each operator. Use the information in the data sets section to identify what connects to what and, if you're interested, the partitioning algorithms used.