slow read from dataset - combinability mode issue?

miwinter · Post by **miwinter** » Tue Jan 09, 2007 11:23 am

Hi,

I have a job with one side of the join sequential and one side dataset (don't ask why, I wonder myself why the sequential is not also a dataset - surely this would be more efficient...?)

The sequential side read in 500k rows in seconds. The dataset side of the join is reading in at just 66 rows/sec however - surely this is very low? It seems to be a real bottleneck in the job. I've noticed on the join stage that combinability mode is also set to "Don't Combine" but I can't see any apparent reason why.

I've already searched on here and read about the Combinability Mode, but it didn't reveal much more than the Advanced PX Developer's Guide did.

I hope that made sense... any input appreciated!

Cheers,

M

ray.wurlod · Post by **ray.wurlod** » Tue Jan 09, 2007 2:14 pm

Are you using the same configuration file and the same partitioning when reading the Data Set that were used when it was written? If not you are incurring the cost of repartitioning these data, as well as of partitioning the sequential file data (which is unavoidable for parallel execution).

Andet · Post by **Andet** » Tue Jan 09, 2007 2:18 pm

When reading datasets it's important to keep in mind what is actually happening. Datasets are not normally sequential files. Just like a partitioned database, you have separate partitions that should be on separate devices or at least separate file systems. If you monitor the job, you should be able to see your throughput by partition/stream and this might give you a clue. Check your configuration file. Are you using the configuration file saved with the dataset or another? If the configuration file you're using does not match the dataset, you could have bad performance or ever crash.
You're reading this into a join....is the dataset sorted by key? Is the sort key also the partition key of the dataset?
Do you have more partitions than CPUs? What's your page/swap rate? Are you buffering? We had a job that ran for 15 minutes until we specified buffering and now it runs in 4 minutes. Etc. etc.

Ande

kumar_s · Post by **kumar_s** » Tue Jan 09, 2007 4:39 pm

In addition it also depends on the number of field. Hundreds of fields in Dataset will be slower in extraction than fewer fields in sequential file. Is it a outer join? If os on which stream?

miwinter · Post by **miwinter** » Wed Jan 10, 2007 3:24 am

Thanks for the replies so far. In response:

- It's an inner join and both inputs are partitioned/sorted on the join key already

- We run on 4 nodes, that is, it is 4-way partitioned and operate across 10 CPUs

- The buffering on the join stage is set to 'Default'

- The dataset metadata is just 3 columns... approx 70 bytes per record across 2.1 million records in all

ray.wurlod · Post by **ray.wurlod** » Wed Jan 10, 2007 4:05 am

Look at the score (set APT_DUMP_SCORE to True). Is DataStage inserting tsort operators or buffer operators?

ray.wurlod · Post by **ray.wurlod** » Wed Jan 10, 2007 4:05 am

Look at the score (set APT_DUMP_SCORE to True). Is DataStage inserting tsort operators or buffer operators?

miwinter · Post by **miwinter** » Wed Jan 10, 2007 4:36 am

Cheers Ray - I've checked and the score shows none of these being added:

main_program: This step has 8 datasets:
ds0: {/gcdm/prd/workingarea/MI/Post-results/LCDM_2875_MI_POSTRESULTS/ALGO_SA_CONTR_2875_DS
[pp] eSame=>eCollectAny
op1[4p] (parallel SAContr_Read_DS)}
ds1: {op0[1p] (sequential TrDep_Read_FF)
eAny<>eCollectAny
op2[4p] (parallel APT_TransformOperatorImplV0S3_GleamMIPostAlgoTrDepDervJob_TrDepDerv_Tfp in TrDepDerv_Tfp)}
ds2: {op1[4p] (parallel SAContr_Read_DS)
[pp] eSame=>eCollectAny
op4[4p] (parallel buffer(0))}
ds3: {op2[4p] (parallel APT_TransformOperatorImplV0S3_GleamMIPostAlgoTrDepDervJob_TrDepDerv_Tfp in TrDepDerv_Tfp)
eOther(APT_HashPartitioner { key={ value=AGG_ID,
subArgs={ cs }
}
})#>eCollectAny
op3[4p] (parallel TrDep_Srt)}
ds4: {op3[4p] (parallel TrDep_Srt)
[pp] eSame=>eCollectAny
op5[4p] (parallel buffer(1))}
ds5: {op4[4p] (parallel buffer(0))
[pp] eSame=>eCollectAny
op6[4p] (parallel APT_JoinSubOperator in TrDep_Jon)}
ds6: {op5[4p] (parallel buffer(1))
[pp] eSame=>eCollectAny
op6[4p] (parallel APT_JoinSubOperator in TrDep_Jon)}
ds7: {op6[4p] (parallel APT_JoinSubOperator in TrDep_Jon)
eOther(APT_DB2Partitioner {})#>eCollectAny
op7[10p] (parallel MIResults_Update_EETab)}
It has 8 operators:
op0[1p] {(sequential TrDep_Read_FF)
on nodes (
node1[op0,p0]
)}
op1[4p] {(parallel SAContr_Read_DS)
on nodes (
node1[op1,p0]
node2[op1,p1]
node3[op1,p2]
node4[op1,p3]
)}
op2[4p] {(parallel APT_TransformOperatorImplV0S3_GleamMIPostAlgoTrDepDervJob_TrDepDerv_Tfp in TrDepDerv_Tfp)
on nodes (
node1[op2,p0]
node2[op2,p1]
node3[op2,p2]
node4[op2,p3]
)}
op3[4p] {(parallel TrDep_Srt)
on nodes (
node1[op3,p0]
node2[op3,p1]
node3[op3,p2]
node4[op3,p3]
)}
op4[4p] {(parallel buffer(0))
on nodes (
node1[op4,p0]
node2[op4,p1]
node3[op4,p2]
node4[op4,p3]
)}
op5[4p] {(parallel buffer(1))
on nodes (
node1[op5,p0]
node2[op5,p1]
node3[op5,p2]
node4[op5,p3]
)}
op6[4p] {(parallel APT_JoinSubOperator in TrDep_Jon)
on nodes (
node1[op6,p0]
node2[op6,p1]
node3[op6,p2]
node4[op6,p3]
)}
op7[10p] {(parallel MIResults_Update_EETab)
on nodes (
db2node[op7,p0]
db2node[op7,p1]
db2node[op7,p2]
db2node[op7,p3]
db2node[op7,p4]
db2node[op7,p5]
db2node[op7,p6]
db2node[op7,p7]
db2node[op7,p8]
db2node[op7,p9]
)}

ajit · Post by **ajit** » Wed Jan 10, 2007 5:45 am

Hi
Have you also checked the partitioning type used when the dataset was being created? This is to just ensure there is no repartioning when the data is being read. I am just suggesting......

miwinter · Post by **miwinter** » Wed Jan 10, 2007 5:49 am

miwinter wrote:...both inputs are partitioned/sorted on the join key already

I meant this in reference to the job which produces the dataset, which is being used as an input in the job which is slow to read in a dataset.

The sequential file input to the 'problem job' is partitioned/sorted within this job

ray.wurlod · Post by **ray.wurlod** » Wed Jan 10, 2007 2:15 pm

op4 and op5 look like buffers to me. These have probably been inserted to avoid deadlocks due to different throughput rates. The Sequential File stage (import operator) uses the C I/O STREAMS module, and is very fast compared to all other read mechanisms.

Also, be very wary of rows/sec as a metric; there are lots of reasons it can be misleading.

miwinter · Post by **miwinter** » Mon Feb 19, 2007 11:09 am

These operators seem to be linked to the join, so that the two streams to be joined are managed - I assume these need to remain to manage this correctly. Is there any tuning can be done on these links?

ray.wurlod · Post by **ray.wurlod** » Mon Feb 19, 2007 3:09 pm

What happens if you allow the partitioning from the Data Set to be (Auto) rather than Same? Do you get a repartitioning icon on the link? (You don't need to re-run the job - just change the job, then exit without saving, to answer this question.)

The inserted buffer operators are there to attempt to keep pipeline parallelism happening. Each is 3MB by default, which should be adequate unless you've got really wide rows. They are tunable, but this should be the very last thing tuned.

Do you have explicit sorts on the input links to the Join stage? If not, one thing that might help is explicit Sort stages, with the Sort Mode set to "Don't sort (previously sorted)".

DSXchange

slow read from dataset - combinability mode issue?

slow read from dataset - combinability mode issue?

Check partitioning when creating the dataset