Pre partitioned - sorted Dataset read Very slow
Moderators: chulett, rschirm, roy
Pre partitioned - sorted Dataset read Very slow
Hi,
I have Partitioned and sorted data in a dataset with approximately 26 Million records.
When using the same dataset as a source in another job and not changing the Partition or sort order, i.e. using same partinioning as below:-
Dataset(Set) ------>Join(Same Partitioning)
For a very long time the Number or records read from this source dataset is "0". I was expecting that using a pre-partitioned and sorted dataset as source will Increase performance. But there seem to be no difference as the dataset was read after a very very long time by the datastage.
Is there some setting required for faster reads.
Thanks in Advance
Rohan
I have Partitioned and sorted data in a dataset with approximately 26 Million records.
When using the same dataset as a source in another job and not changing the Partition or sort order, i.e. using same partinioning as below:-
Dataset(Set) ------>Join(Same Partitioning)
For a very long time the Number or records read from this source dataset is "0". I was expecting that using a pre-partitioned and sorted dataset as source will Increase performance. But there seem to be no difference as the dataset was read after a very very long time by the datastage.
Is there some setting required for faster reads.
Thanks in Advance
Rohan
What other stages exist after the join stage? I'm certain this isn't your whole job flow and the issue could likely lie elsewhere...
How long is a "very very long time" for it to read the rows into the join stage? Bear in mind it is loading these up into memory, so the issue could also be contention there or contention in the filesystem the dataset resides in.
How long is a "very very long time" for it to read the rows into the join stage? Bear in mind it is loading these up into memory, so the issue could also be contention there or contention in the filesystem the dataset resides in.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
DataStage puts in a Sort stage before the join. And probably the DataSet and Sort are combined onto one operator. Look at the score of the job to determine if this is the case. I suggest you put in a sort stage and add the property Sort Key Mode to "Don't sort Previously Sorted". This might help.
"given enough eyeballs, all bugs are shallow" - Eric S. Raymond
The same end can be achieved on the input tab of your join, where you partition same. Check to sort and check stable to ensure it doesn't re-sort your dataset and negate the need to adding a new stage.
Last edited by miwinter on Mon Jun 01, 2009 4:40 am, edited 1 time in total.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
I assume the inline stable sort option still utilize the scratch disk space. I have restrictions on the scratch space size as well. This is the reason I had sorted at Oracle side in my extraction step to generate this Partitioned and sorted dataset.
However I assume the "Don't sort previously sorted" option does not use the Scratch disk space.
Please correct me If I am wrong.
Regards
Rohan
However I assume the "Don't sort previously sorted" option does not use the Scratch disk space.
Please correct me If I am wrong.
Regards
Rohan
OK, so you go with the sort stage using don't sort, previously sorted on your sort/join key.
You could also add APT_NO_SORT_INSERTION to your job, but this will disable at job-level, so you'd need to take care of any other sorting requirements manually for the other stages in your second job.
Let us know which way you go and what effect it produces.
You could also add APT_NO_SORT_INSERTION to your job, but this will disable at job-level, so you'd need to take care of any other sorting requirements manually for the other stages in your second job.
Let us know which way you go and what effect it produces.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>