Sorting data - partition design question
Posted: Tue Dec 02, 2008 7:48 pm
Hi ,
I had a question regarding the ideal partitioning strategy for sorting data , and landing it in 2 datasets thereafter.
My design is as :
(Inputdata)-->Sort stage ---> Copy ----dataset1
------------------------------------|
-----------------------------------dataset2
That is to say , sort data and then copy the sorted data to land in 2 different datasets.
The sort stage sorts on 2 keys , namely sortcode (ranging from 1-12) and store#
I have been trying to play around with the partitioning to accomplish this , but i dont get what i need .The results when i use auto partition everywhere in the specified stages are similar to what is shown :
sortcode-------------store number----column1----column 'n'
4--------------------A
4--------------------B
4--------------------C
6 -------------------X
6 ------------------- Y
2 ------------------- U
2 ------------------- V
1 ------------------- I
1 ------------------- J
1 -------------------- K
4-------------------- D
4 -------------------- E
I basically need all the records with sort code 1 to be ahead of those with 2 and so on.
The execution mode of the datasets is set to be parallel.
Any help on how i could make this work with a single sort stage is greatly appreciated!
Thanks in advance!!
I had a question regarding the ideal partitioning strategy for sorting data , and landing it in 2 datasets thereafter.
My design is as :
(Inputdata)-->Sort stage ---> Copy ----dataset1
------------------------------------|
-----------------------------------dataset2
That is to say , sort data and then copy the sorted data to land in 2 different datasets.
The sort stage sorts on 2 keys , namely sortcode (ranging from 1-12) and store#
I have been trying to play around with the partitioning to accomplish this , but i dont get what i need .The results when i use auto partition everywhere in the specified stages are similar to what is shown :
sortcode-------------store number----column1----column 'n'
4--------------------A
4--------------------B
4--------------------C
6 -------------------X
6 ------------------- Y
2 ------------------- U
2 ------------------- V
1 ------------------- I
1 ------------------- J
1 -------------------- K
4-------------------- D
4 -------------------- E
I basically need all the records with sort code 1 to be ahead of those with 2 and so on.
The execution mode of the datasets is set to be parallel.
Any help on how i could make this work with a single sort stage is greatly appreciated!
Thanks in advance!!