- - When rerunning the job with no changes to the source data, the SCD stage identifies all rows as new inserts.
- This problem does not occur when running on a single node.
- - Input partitioning tab: The dimension link partitioning=Auto; the input link partition=Hash on BAS_NK & SOURCE_SYSTEM_ID, and sorted on BAS_NK & SOURCE_SYSTEM_ID, and SNAPSHOT_TIMESTAMP.
- Lookup tab purpose codes:
Project_Baseline_Key=Surrogate Key
BAS_NK=Business Key
Baseline_Name=Type 2
Description=Type 2
Last_Update=Type 2
Valid_From_Timestamp=Effective Date
Valid_To_Timestamp=Expiration Date
SOURCE_SYSTEM_ID=Business Key
- Lookup tab Key Expressions: Only the BAS_NK & SOURCE_SYSTEM_ID are used, and all other columns are blank (i.e. not used for matching).
- Surrogate Key tab: Uses a DB2 Sequence.
- Dim Update tab:
Project_Baseline_Key=NextSurrogateKey()
BAS_NK=BAS_NK
Baseline_Name=Baseline_Name
Description=Description
Last_Update=Last_Update
Valid_From_Timestamp=SNAPSHOT_TIMESTAMP
Valid_To_Timestamp='2999-12-31 23:59:59.000000'
SOURCE_SYSTEM_ID=SOURCE_SYSTEM_ID
- Output Map: Too many to list; the main thing is the Project_Baseline_Key is passed through, but not the other dimension columns.
- - I had a look at the Score, but do not see the problem. I've reduced the job to just the problem SCD stage, and attached the Score:
Code: Select all
main_program: This step has 12 datasets:
ds0: {/apps/InformationServer/dsadmd/descriptors/TestInput.ds
eAny=>eCollectAny
op3[4p] (parallel dsTestInput)}
ds1: {op0[1p] (sequential db2DimBasOld)
eAny<>eCollectAny
op1[4p] (parallel copyPlaceholder4)}
ds2: {op1[4p] (parallel copyPlaceholder4)
eAny=>eCollectAny
op2[4p] (parallel copyPlaceholder4:RF02Before-FILTERED)}
ds3: {op2[4p] (parallel copyPlaceholder4:RF02Before-FILTERED)
eOther(APT_HashPartitioner { key={ value=BAS_NK },
key={ value=SOURCE_SYSTEM_ID }
})#>eCollectAny
op6[4p] (parallel inserted tsort operator {key={value=BAS_NK, subArgs={asc, nulls={value=first}, cs}}, key={value=SOURCE_SYSTEM_ID, subArgs={asc, nulls={value=first}, cs}}}(0) in scdDimBaseline)}
ds4: {op3[4p] (parallel dsTestInput)
eAny=>eCollectAny
op4[4p] (parallel copyPartitionBasNK)}
ds5: {op4[4p] (parallel copyPartitionBasNK)
eOther(APT_HashPartitioner { key={ value=BAS_NK,
subArgs={ cs }
},
key={ value=SOURCE_SYSTEM_ID,
subArgs={ cs }
}
})#>eCollectAny
op5[4p] (parallel scdDimBaseline.BA10_RepPerd_Sort)}
ds6: {op5[4p] (parallel scdDimBaseline.BA10_RepPerd_Sort)
[pp] eSame=>eCollectAny
op8[4p] (parallel APT_TransformOperatorImplV24S13_Debug01_FactPrjRepPerdPerf_scdDimBaseline in scdDimBaseline)}
ds7: {op6[4p] (parallel inserted tsort operator {key={value=BAS_NK, subArgs={asc, nulls={value=first}, cs}}, key={value=SOURCE_SYSTEM_ID, subArgs={asc, nulls={value=first}, cs}}}(0) in scdDimBaseline)
[pp] eSame=>eCollectAny
op7[4p] (parallel APT_LUTCreateOp in scdDimBaseline)}
ds8: {op7[4p] (parallel APT_LUTCreateOp in scdDimBaseline)
eEntire#>eCollectAny
op8[4p] (parallel APT_TransformOperatorImplV24S13_Debug01_FactPrjRepPerdPerf_scdDimBaseline in scdDimBaseline)}
ds9: {op7[4p] (parallel APT_LUTCreateOp in scdDimBaseline)
eAny=>eCollectAny
op8[4p] (parallel APT_TransformOperatorImplV24S13_Debug01_FactPrjRepPerdPerf_scdDimBaseline in scdDimBaseline)}
ds10: {op8[4p] (parallel APT_TransformOperatorImplV24S13_Debug01_FactPrjRepPerdPerf_scdDimBaseline in scdDimBaseline)
eAny=>eCollectAny
op9[4p] (parallel copyPlaceholder5)}
ds11: {op8[4p] (parallel APT_TransformOperatorImplV24S13_Debug01_FactPrjRepPerdPerf_scdDimBaseline in scdDimBaseline)
[pp] eSame=>eCollectAny
op10[4p] (parallel copyPlaceholder6)}
It has 11 operators:
op0[1p] {(sequential db2DimBasOld)
on nodes (
node1[op0,p0]
)}
op1[4p] {(parallel copyPlaceholder4)
on nodes (
node1[op1,p0]
node2[op1,p1]
node3[op1,p2]
node4[op1,p3]
)}
op2[4p] {(parallel copyPlaceholder4:RF02Before-FILTERED)
on nodes (
node1[op2,p0]
node2[op2,p1]
node3[op2,p2]
node4[op2,p3]
)}
op3[4p] {(parallel dsTestInput)
on nodes (
node1[op3,p0]
node2[op3,p1]
node3[op3,p2]
node4[op3,p3]
)}
op4[4p] {(parallel copyPartitionBasNK)
on nodes (
node1[op4,p0]
node2[op4,p1]
node3[op4,p2]
node4[op4,p3]
)}
op5[4p] {(parallel scdDimBaseline.BA10_RepPerd_Sort)
on nodes (
node1[op5,p0]
node2[op5,p1]
node3[op5,p2]
node4[op5,p3]
)}
op6[4p] {(parallel inserted tsort operator {key={value=BAS_NK, subArgs={asc, nulls={value=first}, cs}}, key={value=SOURCE_SYSTEM_ID, subArgs={asc, nulls={value=first}, cs}}}(0) in scdDimBaseline)
on nodes (
node1[op6,p0]
node2[op6,p1]
node3[op6,p2]
node4[op6,p3]
)}
op7[4p] {(parallel APT_LUTCreateOp in scdDimBaseline)
on nodes (
node1[op7,p0]
node2[op7,p1]
node3[op7,p2]
node4[op7,p3]
)}
op8[4p] {(parallel APT_TransformOperatorImplV24S13_Debug01_FactPrjRepPerdPerf_scdDimBaseline in scdDimBaseline)
on nodes (
node1[op8,p0]
node2[op8,p1]
node3[op8,p2]
node4[op8,p3]
)}
op9[4p] {(parallel copyPlaceholder5)
on nodes (
node1[op9,p0]
node2[op9,p1]
node3[op9,p2]
node4[op9,p3]
)}
op10[4p] {(parallel copyPlaceholder6)
on nodes (
node1[op10,p0]
node2[op10,p1]
node3[op10,p2]
node4[op10,p3]
)}
It runs 41 processes on 4 nodes.
Since it works on 1 node, my guess is that I've messed up the partitioning somehow. Or is it that we've used '2999-12-31 23:59:59.000000' to indicate the current rows, instead of null?
I'm sure it'll be something simple, and I'll scamper off embarrassed and with my tail between my legs.
But I just can't put my finger on the problem.
Any ideas?