slow read from dataset - combinability mode issue?
Moderators: chulett, rschirm, roy
slow read from dataset - combinability mode issue?
Hi,
I have a job with one side of the join sequential and one side dataset (don't ask why, I wonder myself why the sequential is not also a dataset - surely this would be more efficient...?)
The sequential side read in 500k rows in seconds. The dataset side of the join is reading in at just 66 rows/sec however - surely this is very low? It seems to be a real bottleneck in the job. I've noticed on the join stage that combinability mode is also set to "Don't Combine" but I can't see any apparent reason why.
I've already searched on here and read about the Combinability Mode, but it didn't reveal much more than the Advanced PX Developer's Guide did.
I hope that made sense... any input appreciated!
Cheers,
M
I have a job with one side of the join sequential and one side dataset (don't ask why, I wonder myself why the sequential is not also a dataset - surely this would be more efficient...?)
The sequential side read in 500k rows in seconds. The dataset side of the join is reading in at just 66 rows/sec however - surely this is very low? It seems to be a real bottleneck in the job. I've noticed on the join stage that combinability mode is also set to "Don't Combine" but I can't see any apparent reason why.
I've already searched on here and read about the Combinability Mode, but it didn't reveal much more than the Advanced PX Developer's Guide did.
I hope that made sense... any input appreciated!
Cheers,
M
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Are you using the same configuration file and the same partitioning when reading the Data Set that were used when it was written? If not you are incurring the cost of repartitioning these data, as well as of partitioning the sequential file data (which is unavoidable for parallel execution).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
When reading datasets it's important to keep in mind what is actually happening. Datasets are not normally sequential files. Just like a partitioned database, you have separate partitions that should be on separate devices or at least separate file systems. If you monitor the job, you should be able to see your throughput by partition/stream and this might give you a clue. Check your configuration file. Are you using the configuration file saved with the dataset or another? If the configuration file you're using does not match the dataset, you could have bad performance or ever crash.
You're reading this into a join....is the dataset sorted by key? Is the sort key also the partition key of the dataset?
Do you have more partitions than CPUs? What's your page/swap rate? Are you buffering? We had a job that ran for 15 minutes until we specified buffering and now it runs in 4 minutes. Etc. etc.
Ande
You're reading this into a join....is the dataset sorted by key? Is the sort key also the partition key of the dataset?
Do you have more partitions than CPUs? What's your page/swap rate? Are you buffering? We had a job that ran for 15 minutes until we specified buffering and now it runs in 4 minutes. Etc. etc.
Ande
Thanks for the replies so far. In response:
- It's an inner join and both inputs are partitioned/sorted on the join key already
- We run on 4 nodes, that is, it is 4-way partitioned and operate across 10 CPUs
- The buffering on the join stage is set to 'Default'
- The dataset metadata is just 3 columns... approx 70 bytes per record across 2.1 million records in all
- It's an inner join and both inputs are partitioned/sorted on the join key already
- We run on 4 nodes, that is, it is 4-way partitioned and operate across 10 CPUs
- The buffering on the join stage is set to 'Default'
- The dataset metadata is just 3 columns... approx 70 bytes per record across 2.1 million records in all
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Cheers Ray - I've checked and the score shows none of these being added:
main_program: This step has 8 datasets:
ds0: {/gcdm/prd/workingarea/MI/Post-results/LCDM_2875_MI_POSTRESULTS/ALGO_SA_CONTR_2875_DS
[pp] eSame=>eCollectAny
op1[4p] (parallel SAContr_Read_DS)}
ds1: {op0[1p] (sequential TrDep_Read_FF)
eAny<>eCollectAny
op2[4p] (parallel APT_TransformOperatorImplV0S3_GleamMIPostAlgoTrDepDervJob_TrDepDerv_Tfp in TrDepDerv_Tfp)}
ds2: {op1[4p] (parallel SAContr_Read_DS)
[pp] eSame=>eCollectAny
op4[4p] (parallel buffer(0))}
ds3: {op2[4p] (parallel APT_TransformOperatorImplV0S3_GleamMIPostAlgoTrDepDervJob_TrDepDerv_Tfp in TrDepDerv_Tfp)
eOther(APT_HashPartitioner { key={ value=AGG_ID,
subArgs={ cs }
}
})#>eCollectAny
op3[4p] (parallel TrDep_Srt)}
ds4: {op3[4p] (parallel TrDep_Srt)
[pp] eSame=>eCollectAny
op5[4p] (parallel buffer(1))}
ds5: {op4[4p] (parallel buffer(0))
[pp] eSame=>eCollectAny
op6[4p] (parallel APT_JoinSubOperator in TrDep_Jon)}
ds6: {op5[4p] (parallel buffer(1))
[pp] eSame=>eCollectAny
op6[4p] (parallel APT_JoinSubOperator in TrDep_Jon)}
ds7: {op6[4p] (parallel APT_JoinSubOperator in TrDep_Jon)
eOther(APT_DB2Partitioner {})#>eCollectAny
op7[10p] (parallel MIResults_Update_EETab)}
It has 8 operators:
op0[1p] {(sequential TrDep_Read_FF)
on nodes (
node1[op0,p0]
)}
op1[4p] {(parallel SAContr_Read_DS)
on nodes (
node1[op1,p0]
node2[op1,p1]
node3[op1,p2]
node4[op1,p3]
)}
op2[4p] {(parallel APT_TransformOperatorImplV0S3_GleamMIPostAlgoTrDepDervJob_TrDepDerv_Tfp in TrDepDerv_Tfp)
on nodes (
node1[op2,p0]
node2[op2,p1]
node3[op2,p2]
node4[op2,p3]
)}
op3[4p] {(parallel TrDep_Srt)
on nodes (
node1[op3,p0]
node2[op3,p1]
node3[op3,p2]
node4[op3,p3]
)}
op4[4p] {(parallel buffer(0))
on nodes (
node1[op4,p0]
node2[op4,p1]
node3[op4,p2]
node4[op4,p3]
)}
op5[4p] {(parallel buffer(1))
on nodes (
node1[op5,p0]
node2[op5,p1]
node3[op5,p2]
node4[op5,p3]
)}
op6[4p] {(parallel APT_JoinSubOperator in TrDep_Jon)
on nodes (
node1[op6,p0]
node2[op6,p1]
node3[op6,p2]
node4[op6,p3]
)}
op7[10p] {(parallel MIResults_Update_EETab)
on nodes (
db2node[op7,p0]
db2node[op7,p1]
db2node[op7,p2]
db2node[op7,p3]
db2node[op7,p4]
db2node[op7,p5]
db2node[op7,p6]
db2node[op7,p7]
db2node[op7,p8]
db2node[op7,p9]
)}
main_program: This step has 8 datasets:
ds0: {/gcdm/prd/workingarea/MI/Post-results/LCDM_2875_MI_POSTRESULTS/ALGO_SA_CONTR_2875_DS
[pp] eSame=>eCollectAny
op1[4p] (parallel SAContr_Read_DS)}
ds1: {op0[1p] (sequential TrDep_Read_FF)
eAny<>eCollectAny
op2[4p] (parallel APT_TransformOperatorImplV0S3_GleamMIPostAlgoTrDepDervJob_TrDepDerv_Tfp in TrDepDerv_Tfp)}
ds2: {op1[4p] (parallel SAContr_Read_DS)
[pp] eSame=>eCollectAny
op4[4p] (parallel buffer(0))}
ds3: {op2[4p] (parallel APT_TransformOperatorImplV0S3_GleamMIPostAlgoTrDepDervJob_TrDepDerv_Tfp in TrDepDerv_Tfp)
eOther(APT_HashPartitioner { key={ value=AGG_ID,
subArgs={ cs }
}
})#>eCollectAny
op3[4p] (parallel TrDep_Srt)}
ds4: {op3[4p] (parallel TrDep_Srt)
[pp] eSame=>eCollectAny
op5[4p] (parallel buffer(1))}
ds5: {op4[4p] (parallel buffer(0))
[pp] eSame=>eCollectAny
op6[4p] (parallel APT_JoinSubOperator in TrDep_Jon)}
ds6: {op5[4p] (parallel buffer(1))
[pp] eSame=>eCollectAny
op6[4p] (parallel APT_JoinSubOperator in TrDep_Jon)}
ds7: {op6[4p] (parallel APT_JoinSubOperator in TrDep_Jon)
eOther(APT_DB2Partitioner {})#>eCollectAny
op7[10p] (parallel MIResults_Update_EETab)}
It has 8 operators:
op0[1p] {(sequential TrDep_Read_FF)
on nodes (
node1[op0,p0]
)}
op1[4p] {(parallel SAContr_Read_DS)
on nodes (
node1[op1,p0]
node2[op1,p1]
node3[op1,p2]
node4[op1,p3]
)}
op2[4p] {(parallel APT_TransformOperatorImplV0S3_GleamMIPostAlgoTrDepDervJob_TrDepDerv_Tfp in TrDepDerv_Tfp)
on nodes (
node1[op2,p0]
node2[op2,p1]
node3[op2,p2]
node4[op2,p3]
)}
op3[4p] {(parallel TrDep_Srt)
on nodes (
node1[op3,p0]
node2[op3,p1]
node3[op3,p2]
node4[op3,p3]
)}
op4[4p] {(parallel buffer(0))
on nodes (
node1[op4,p0]
node2[op4,p1]
node3[op4,p2]
node4[op4,p3]
)}
op5[4p] {(parallel buffer(1))
on nodes (
node1[op5,p0]
node2[op5,p1]
node3[op5,p2]
node4[op5,p3]
)}
op6[4p] {(parallel APT_JoinSubOperator in TrDep_Jon)
on nodes (
node1[op6,p0]
node2[op6,p1]
node3[op6,p2]
node4[op6,p3]
)}
op7[10p] {(parallel MIResults_Update_EETab)
on nodes (
db2node[op7,p0]
db2node[op7,p1]
db2node[op7,p2]
db2node[op7,p3]
db2node[op7,p4]
db2node[op7,p5]
db2node[op7,p6]
db2node[op7,p7]
db2node[op7,p8]
db2node[op7,p9]
)}
Check partitioning when creating the dataset
Hi
Have you also checked the partitioning type used when the dataset was being created? This is to just ensure there is no repartioning when the data is being read. I am just suggesting......![Smile :)](./images/smilies/icon_smile.gif)
Have you also checked the partitioning type used when the dataset was being created? This is to just ensure there is no repartioning when the data is being read. I am just suggesting......
![Smile :)](./images/smilies/icon_smile.gif)
I meant this in reference to the job which produces the dataset, which is being used as an input in the job which is slow to read in a dataset.miwinter wrote:...both inputs are partitioned/sorted on the join key already
The sequential file input to the 'problem job' is partitioned/sorted within this job
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
op4 and op5 look like buffers to me. These have probably been inserted to avoid deadlocks due to different throughput rates. The Sequential File stage (import operator) uses the C I/O STREAMS module, and is very fast compared to all other read mechanisms.
Also, be very wary of rows/sec as a metric; there are lots of reasons it can be misleading.
Also, be very wary of rows/sec as a metric; there are lots of reasons it can be misleading.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
What happens if you allow the partitioning from the Data Set to be (Auto) rather than Same? Do you get a repartitioning icon on the link? (You don't need to re-run the job - just change the job, then exit without saving, to answer this question.)
The inserted buffer operators are there to attempt to keep pipeline parallelism happening. Each is 3MB by default, which should be adequate unless you've got really wide rows. They are tunable, but this should be the very last thing tuned.
Do you have explicit sorts on the input links to the Join stage? If not, one thing that might help is explicit Sort stages, with the Sort Mode set to "Don't sort (previously sorted)".
The inserted buffer operators are there to attempt to keep pipeline parallelism happening. Each is 3MB by default, which should be adequate unless you've got really wide rows. They are tunable, but this should be the very last thing tuned.
Do you have explicit sorts on the input links to the Join stage? If not, one thing that might help is explicit Sort stages, with the Sort Mode set to "Don't sort (previously sorted)".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.