Which partition i need to use?
Moderators: chulett, rschirm, roy
Which partition i need to use?
I am not getting the target data in sort manner.
I have tried with all types of paritions but no use.
Scenario is very simple as follows.
ROW GENERATOR ----->SORT STAGE------->Data Set.
source has only one column data type is integer.
Source: Row Generator
COLUMN1
0
1
2
3
4
5
6
7
8
9
Target
1
2
3
4
5
0
6
7
8
9
I have tried with all types of paritions but no use.
Scenario is very simple as follows.
ROW GENERATOR ----->SORT STAGE------->Data Set.
source has only one column data type is integer.
Source: Row Generator
COLUMN1
0
1
2
3
4
5
6
7
8
9
Target
1
2
3
4
5
0
6
7
8
9
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
These are being sorted correctly, on two nodes. You will notice two sorted sub-lists. If you need to sort across the entire data, run the whole thing on one node or in sequential mode.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
What is your desired outcome?
As Ray has mentioned, the data IS sorted correctly. You are running a two node configuration, therefore your dataset by default contains two partitions. The Row Generator stage is running sequential by default, but the data is partitioned going into the sort stage, most likely Hash on the sort key. The data is then sorted WITHIN the partitions, not across.
When you view a partitioned dataset, you will typically see a block of records from one partition, then a block from another partition, and so on. The view (and peek in a DS job if running parallel) will not mingle the records together (they have no concept of how the data is supposed to be ordered). This is why your output appears as it does...it is showing you the records in one partition then the records in the other partition.
If you strictly are wanting to see the data in order--0 1 2 3 4 5 6 7 8 9--either run the dataset in sequential mode or write to a sequential file, in either case using a sort collection on the input link. If you will be performing other logic behind this, rest assured that the data IS sorted correctly within each partition (you might think of each partition as an independent stream).
If you're concerned with exactly which rows go to which partition, read up on the various partitioning options and how to implement them in the IS manuals. You'll likely find that, for just viewing the data, it's not worth the extra effort that some of the partition types would require.
As Ray has mentioned, the data IS sorted correctly. You are running a two node configuration, therefore your dataset by default contains two partitions. The Row Generator stage is running sequential by default, but the data is partitioned going into the sort stage, most likely Hash on the sort key. The data is then sorted WITHIN the partitions, not across.
When you view a partitioned dataset, you will typically see a block of records from one partition, then a block from another partition, and so on. The view (and peek in a DS job if running parallel) will not mingle the records together (they have no concept of how the data is supposed to be ordered). This is why your output appears as it does...it is showing you the records in one partition then the records in the other partition.
If you strictly are wanting to see the data in order--0 1 2 3 4 5 6 7 8 9--either run the dataset in sequential mode or write to a sequential file, in either case using a sort collection on the input link. If you will be performing other logic behind this, rest assured that the data IS sorted correctly within each partition (you might think of each partition as an independent stream).
If you're concerned with exactly which rows go to which partition, read up on the various partitioning options and how to implement them in the IS manuals. You'll likely find that, for just viewing the data, it's not worth the extra effort that some of the partition types would require.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
For large volumes of data you can use two adjacent Sort stages, one that sorts in parallel and the other which executes sequentially, using a Sort Merge collector and does not otherwise sort at all.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.