Heap Size errror in Transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

Heap Size errror in Transformer

Post by just4u_sharath »

I have run my job to extract 9Million records and my job aborted indicating the below error. In this i cannot understand what is heap size.
What is heap size in transformer and how can i overcome this fatal error.

APT_CombinedOperatorController,0: The current soft limit on the data segment (heap) size (805306368) is less than the hard limit (2147483647), consider increasing the heap size limit

APT_CombinedOperatorController,0: Current heap size: 671491008 bytes in 16654829 blocks

Xfm_FsdbLayout,0: Failure during execution of operator logic.

Xfm_FsdbLayout,0: Input 0 consumed 8320622 records.

Xfm_FsdbLayout,0: Output 0 produced 8320621 records.

APT_CombinedOperatorController,0: Fatal Error: Throwing exception: APT_BadAlloc: Heap allocation failed.

node_PFS1_TST-node1: Player 20 terminated unexpectedly.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

You have a Aggregator in your Job.
You have enabled the Combinable operators. So the error i thrown out in Transformer.
Change the Aggregation method to Sort in Aggregator.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The error is actually thrown by APT_CombinedOperatorController - you have no idea which stage is to blame. That's why we rarely attempt to diagnose errors from this operator. It may even be that the heap problems come from an inserted tsort operator.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

Post by just4u_sharath »

ray.wurlod wrote:The error is actually thrown by APT_CombinedOperatorController - you have no idea which stage is to blame. That's why we rarely attempt to diagnose errors from this operator. It may even be that the heap problems come from an inserted tsort operator.
I am a novice at this kinds of errors.
So please can you let me know how can i debug these kinds of errors and find in which stage the actual error is. And there is no aggregator stage in my job but certainly sort stages are there. so may be tsort operator failure.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Ray made the right guess (judgment)!!
Set the Combinable mode to false in the Transformer's properteis. You ll get the error in the right stages!
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

Post by just4u_sharath »

kumar_s wrote:Ray made the right guess (judgment)!!
Set the Combinable mode to false in the Transformer's properteis. You ll get the error in the right stages!
In my job there is a lookup stage looking up datasets and a transformer outputted to datasets and funnel stage. No sorting. No aggregator. But the still the problem persists. Is this error is due to my logic in transformer or because of space issues. Replies will be appreciated.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Disable operator combination so that you can find out where the error is occurring. Help us to help you. We simply can not (or at least will not) diagnose errors thrown by an arbitrary number of stages combined into the one process.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

Post by just4u_sharath »

ray.wurlod wrote:Disable operator combination so that you can find out where the error is occurring. Help us to help you. We simply can not (or at least will not) diagnose errors thrown by an arbitrary number of stages combined into the one process.
I did as you said. I have disabled the combinality property in all stages of that job. Still the error is shown in Transformer. I cannot figure out where this heap size taking place in transformer. No sorting in transformer. I mean Sort order is not preserved.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Capture and inspect the score - the script that is actually executed, rather than the generated osh.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

Post by just4u_sharath »

ray.wurlod wrote:Capture and inspect the score - the script that is actually executed, rather than the generated osh.
I have used dump_score, Pm_player_memory, pm_player_timing.
Dumpscore says 44 process runs on 2 nodes.
Heap size was good at other stages. but when it came to transformer, th initial info is

Xfm_FsdbLayout,0: The current soft limit on the data segment (heap) size (805306368) is less than the hard limit (2147483647), consider increasing the heap size limit.

Before this the Info is Heap growth during RunLocally(() o bytes.
After the 1st info fatal error is

Xfm_FsdbLayout,0: failure during operator logic.

Still cannot find the real problem.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Then please post the actual errors (not the ones from APT_CombinedOperatorController).

The score would be used to identify whether DataStage had inserted any tsort or buffer operators. Had it?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

Post by just4u_sharath »

ray.wurlod wrote:Then please post the actual errors (not the ones from APT_CombinedOperatorController).

The score would be used to identify whether DataStage had inserted any tsort or buffer operators. Had it?
Thanks for your reply
Below i have pasted the Info and error in the order as shown in director.
Also i have pasted the whole Dump score.
I can there are some buffers coming in between (from dumpscre).

Info: funnel,1: Heap growth during runLocally(): 0 bytes

INfo: Lkup_PsCodes,1: When binding input interface field "PRODUCT_TYPE_NAME" to field "PRODUCT_TYPE_NAME": Implicit conversion from source type "string[max=40]" to result type "string[max=30]": Possible truncation of variable length string.

Info: buffer(0),1: Operator completed. status: APT_StatusOk elapsed: 2996.60 user: 10.70 sys: 5.05 (total CPU: 15.75)

Info: buffer(0),1: Heap growth during runLocally(): 0 bytes
Xfm_FsdbLayout,0: The current soft limit on the data segment (heap) size (805306368) is less than the hard limit (2147483647), consider increasing the heap size limit

Info: Xfm_FsdbLayout,0: Current heap size: 671354528 bytes in 16671885 blocks

Error: Xfm_FsdbLayout,0: Failure during execution of operator logic.

Info: Xfm_FsdbLayout,0: Input 0 consumed 8331251 records.

Info: Xfm_FsdbLayout,0: Output 0 produced 8331250 records.
Output 1 produced 0 records.
Output 2 produced 0 records.
Output 3 produced 0 records.
Output 4 produced 11213 records.
Output 5 produced 0 records.

error:Xfm_FsdbLayout,0: Fatal Error: Throwing exception: APT_BadAlloc: Heap allocation failed.

error: node_PFS1_TST-node1: Player 11 terminated unexpectedly.


Below is the dumpscore

main_program: This step has 40 datasets:
ds0: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_FDSPreppedEqmn1.ds
eAny=>eCollectAny
op0[2p] (parallel dsFDSPreppedEqmn1)}
ds1: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_FDSPreppedNeqmn1.ds
[pp] eSame=>eCollectAny
op1[2p] (parallel dsFDSPreppedNeqmn1)}
ds2: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_MktSegCdCap.ds
eAny=>eCollectAny
op3[2p] (parallel MktSegCd)}
ds3: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_SourceCap.ds
eAny=>eCollectAny
op4[2p] (parallel Source)}
ds4: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_DataTypCdCap.ds
eAny=>eCollectAny
op5[2p] (parallel DataTypCd)}
ds5: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_DstrbStsCdCap.ds
eAny=>eCollectAny
op6[2p] (parallel Dstrb_Sts_Cd)}
ds6: {/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_UnetPrdctCdCap.ds
eAny=>eCollectAny
op7[2p] (parallel Unet_Prdct_Cd)}
ds7: {op0[2p] (parallel dsFDSPreppedEqmn1)
eSame=>eCollectAny
op2[2p] (parallel funnel)}
ds8: {op1[2p] (parallel dsFDSPreppedNeqmn1)
[pp] eSame=>eCollectAny
op2[2p] (parallel funnel)}
ds9: {op2[2p] (parallel funnel)
[pp] eSame=>eCollectAny
op9[2p] (parallel buffer(0))}
ds10: {op3[2p] (parallel MktSegCd)
eEntire>>eCollectAny
op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)}
ds11: {op4[2p] (parallel Source)
eEntire>>eCollectAny
op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)}
ds12: {op5[2p] (parallel DataTypCd)
eEntire>>eCollectAny
op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)}
ds13: {op6[2p] (parallel Dstrb_Sts_Cd)
eEntire>>eCollectAny
op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)}
ds14: {op7[2p] (parallel Unet_Prdct_Cd)
eEntire>>eCollectAny
op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)}
ds15: {op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)
eEntire<>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds16: {op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)
eAny<>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds17: {op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)
eAny<>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds18: {op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)
eAny<>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds19: {op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)
eAny<>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds20: {op8[1p] (parallel APT_LUTCreateOp in Lkup_PsCodes)
eAny<>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds21: {op9[2p] (parallel buffer(0))
[pp] eSame=>eCollectAny
op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)}
ds22: {op10[2p] (parallel APT_LUTProcessOp in Lkup_PsCodes)
[pp] eSame=>eCollectAny
op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)}
ds23: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op12[2p] (parallel buffer(1))}
ds24: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op13[2p] (parallel buffer(2))}
ds25: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op14[2p] (parallel buffer(3))}
ds26: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op15[2p] (parallel dsFDS020PreppedFsdblayout)}
ds27: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op16[2p] (parallel buffer(4))}
ds28: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op17[2p] (parallel buffer(5))}
ds29: {op11[2p] (parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
[pp] eSame=>eCollectAny
op18[2p] (parallel buffer(6))}
ds30: {op12[2p] (parallel buffer(1))
[pp] eSame=>eCollectAny
op19[2p] (parallel funnel_codeset_errors)}
ds31: {op13[2p] (parallel buffer(2))
[pp] eSame=>eCollectAny
op19[2p] (parallel funnel_codeset_errors)}
ds32: {op14[2p] (parallel buffer(3))
[pp] eSame=>eCollectAny
op19[2p] (parallel funnel_codeset_errors)}
ds33: {op16[2p] (parallel buffer(4))
[pp] eSame=>eCollectAny
op19[2p] (parallel funnel_codeset_errors)}
ds34: {op17[2p] (parallel buffer(5))
[pp] eSame=>eCollectAny
op19[2p] (parallel funnel_codeset_errors)}
ds35: {op18[2p] (parallel buffer(6))
[pp] eSame=>eCollectAny
op19[2p] (parallel funnel_codeset_errors)}
ds36: {op19[2p] (parallel funnel_codeset_errors)
eOther(APT_HashPartitioner { key={ value=ERR_SEQ_NBR_1,
subArgs={ asc }
},
key={ value=ERR_FLD_NM_1,
subArgs={ cs, asc }
},
key={ value=ERR_FLD_VAL_1,
subArgs={ cs, asc }
}
})#>eCollectAny
op20[2p] (parallel err_codeset.to_err_codeset_Sort)}
ds37: {op20[2p] (parallel err_codeset.to_err_codeset_Sort)
eSame=>eCollectAny
op21[2p] (parallel buffer(7))}
ds38: {op21[2p] (parallel buffer(7))
>>eCollectOther(APT_SortedMergeCollector { key={ value=ERR_SEQ_NBR_1,
subArgs={ asc }
},
key={ value=ERR_FLD_NM_1,
subArgs={ cs, asc }
},
key={ value=ERR_FLD_VAL_1,
subArgs={ cs, asc }
}
})
op22[1p] (sequential APT_RealFileExportOperator in err_codeset)}
ds39: {op15[2p] (parallel dsFDS020PreppedFsdblayout)
[pp] =>
/etldata/pfs1/fds/tstage/ds_FDS0SequencerCapitation_FDSPreppedFsdbLayout.ds}
It has 23 operators:
op0[2p] {(parallel dsFDSPreppedEqmn1)
on nodes (
PFS1_TST-node1[op0,p0]
PFS1_TST-node2[op0,p1]
)}
op1[2p] {(parallel dsFDSPreppedNeqmn1)
on nodes (
PFS1_TST-node1[op1,p0]
PFS1_TST-node2[op1,p1]
)}
op2[2p] {(parallel funnel)
on nodes (
PFS1_TST-node1[op2,p0]
PFS1_TST-node2[op2,p1]
)}
op3[2p] {(parallel MktSegCd)
on nodes (
PFS1_TST-node1[op3,p0]
PFS1_TST-node2[op3,p1]
)}
op4[2p] {(parallel Source)
on nodes (
PFS1_TST-node1[op4,p0]
PFS1_TST-node2[op4,p1]
)}
op5[2p] {(parallel DataTypCd)
on nodes (
PFS1_TST-node1[op5,p0]
PFS1_TST-node2[op5,p1]
)}
op6[2p] {(parallel Dstrb_Sts_Cd)
on nodes (
PFS1_TST-node1[op6,p0]
PFS1_TST-node2[op6,p1]
)}
op7[2p] {(parallel Unet_Prdct_Cd)
on nodes (
PFS1_TST-node1[op7,p0]
PFS1_TST-node2[op7,p1]
)}
op8[1p] {(parallel APT_LUTCreateOp in Lkup_PsCodes)
on nodes (
PFS1_TST-node1[op8,p0]
)}
op9[2p] {(parallel buffer(0))
on nodes (
PFS1_TST-node1[op9,p0]
PFS1_TST-node2[op9,p1]
)}
op10[2p] {(parallel APT_LUTProcessOp in Lkup_PsCodes)
on nodes (
PFS1_TST-node1[op10,p0]
PFS1_TST-node2[op10,p1]
)}
op11[2p] {(parallel APT_TransformOperatorImplV0S47_FDS20XfmCapitation_Xfm_FsdbLayout in Xfm_FsdbLayout)
on nodes (
PFS1_TST-node1[op11,p0]
PFS1_TST-node2[op11,p1]
)}
op12[2p] {(parallel buffer(1))
on nodes (
PFS1_TST-node1[op12,p0]
PFS1_TST-node2[op12,p1]
)}
op13[2p] {(parallel buffer(2))
on nodes (
PFS1_TST-node1[op13,p0]
PFS1_TST-node2[op13,p1]
)}
op14[2p] {(parallel buffer(3))
on nodes (
PFS1_TST-node1[op14,p0]
PFS1_TST-node2[op14,p1]
)}
op15[2p] {(parallel dsFDS020PreppedFsdblayout)
on nodes (
PFS1_TST-node1[op15,p0]
PFS1_TST-node2[op15,p1]
)}
op16[2p] {(parallel buffer(4))
on nodes (
PFS1_TST-node1[op16,p0]
PFS1_TST-node2[op16,p1]
)}
op17[2p] {(parallel buffer(5))
on nodes (
PFS1_TST-node1[op17,p0]
PFS1_TST-node2[op17,p1]
)}
op18[2p] {(parallel buffer(6))
on nodes (
PFS1_TST-node1[op18,p0]
PFS1_TST-node2[op18,p1]
)}
op19[2p] {(parallel funnel_codeset_errors)
on nodes (
PFS1_TST-node1[op19,p0]
PFS1_TST-node2[op19,p1]
)}
op20[2p] {(parallel err_codeset.to_err_codeset_Sort)
on nodes (
PFS1_TST-node1[op20,p0]
PFS1_TST-node2[op20,p1]
)}
op21[2p] {(parallel buffer(7))
on nodes (
PFS1_TST-node1[op21,p0]
PFS1_TST-node2[op21,p1]
)}
op22[1p] {(sequential APT_RealFileExportOperator in err_codeset)
on nodes (
PFS1_TST-node2[op22,p0]
)}
It runs 44 processes on 2 nodes.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Try to change the field PRODUCT_TYPE_NAME to varchar / Char 40 or use Substring / trim function to reduce it to 30.
How many number of rows are getting processed?
It could be due to the overhead of the auto truncation in link buffer memory.
what is the desing of the job?
Are you using Sort merge collector for any stage? Try changing that as well to see if the job skips the error during run.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There are certainly a lot of Data Sets associated with the Transformer stage. This may explain why demand for memory is so high.

It will take some time to review the score and construct the actual job that runs.

Please be patient, or attempt it yourself. On paper create a "link" for each data set and a "stage" for each operator. Use the information in the data sets section to identify what connects to what and, if you're interested, the partitioning algorithms used.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply