Hi ,
I had a question regarding the ideal partitioning strategy for sorting data , and landing it in 2 datasets thereafter.
My design is as :
(Inputdata)-->Sort stage ---> Copy ----dataset1
------------------------------------|
-----------------------------------dataset2
That is to say , sort data and then copy the sorted data to land in 2 different datasets.
The sort stage sorts on 2 keys , namely sortcode (ranging from 1-12) and store#
I have been trying to play around with the partitioning to accomplish this , but i dont get what i need .The results when i use auto partition everywhere in the specified stages are similar to what is shown :
sortcode-------------store number----column1----column 'n'
4--------------------A
4--------------------B
4--------------------C
6 -------------------X
6 ------------------- Y
2 ------------------- U
2 ------------------- V
1 ------------------- I
1 ------------------- J
1 -------------------- K
4-------------------- D
4 -------------------- E
I basically need all the records with sort code 1 to be ahead of those with 2 and so on.
The execution mode of the datasets is set to be parallel.
Any help on how i could make this work with a single sort stage is greatly appreciated!
Thanks in advance!!
Sorting data - partition design question
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
Re: Sorting data - partition design question
Can't understand the output. generally it doesn't give output like that unless you mess with partitioning,dscon9128 wrote:Hi ,
I had a question regarding the ideal partitioning strategy for sorting data , and landing it in 2 datasets thereafter.
My design is as :
(Inputdata)-->Sort stage ---> Copy ----dataset1
------------------------------------|
-----------------------------------dataset2
That is to say , sort data and then copy the sorted data to land in 2 different datasets.
The sort stage sorts on 2 keys , namely sortcode (ranging from 1-12) and store#
I have been trying to play around with the partitioning to accomplish this , but i dont get what i need .The results when i use auto partition everywhere in the specified stages are similar to what is shown :
sortcode-------------store number----column1----column 'n'
4--------------------A
4--------------------B
4--------------------C
6 -------------------X
6 ------------------- Y
2 ------------------- U
2 ------------------- V
1 ------------------- I
1 ------------------- J
1 -------------------- K
4-------------------- D
4 -------------------- E
I basically need all the records with sort code 1 to be ahead of those with 2 and so on.
The execution mode of the datasets is set to be parallel.
Any help on how i could make this work with a single sort stage is greatly appreciated!
Thanks in advance!!
Use hash partitioning on sort code unless you are using these datasets as reference for lookup.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.![Wink :wink:](./images/smilies/icon_wink.gif)
Genius may have its limitations, but stupidity is not thus handicapped.
![Wink :wink:](./images/smilies/icon_wink.gif)
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Partition by sortcode (modulus or hash as the algorithm) and sort by sortcode then by store number. You did not give us the rule for what determines the data set into which a particular row goes - but presumably you can use a Switch, Filter or Transformer stage to effect that.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
Re: Sorting data - partition design question
hi,
basically the sort stage sorts the data within the partitions. at the o/p of the sort stage the data is sorted within the partitions and when writing to dataset the data from all partitions gets collected, thereby if u look at the entire set of data the sorting looks as if it is lost. To avoid this you can set the sort stage to execute sequentially.
I am not sure of the requirement but u dont need to have the data completely sorted in the dataset, had it been a sequential file then it is just to have the data look as you want it to be.
basically the sort stage sorts the data within the partitions. at the o/p of the sort stage the data is sorted within the partitions and when writing to dataset the data from all partitions gets collected, thereby if u look at the entire set of data the sorting looks as if it is lost. To avoid this you can set the sort stage to execute sequentially.
I am not sure of the requirement but u dont need to have the data completely sorted in the dataset, had it been a sequential file then it is just to have the data look as you want it to be.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The given job design uses Data Sets as targets. Please don't introduce "red herrings". You can use a sort/merge collector if you need a sequential file to preserve sorting; it is not necessary to force the Sort stage to execute in sequential mode.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.