Which partition i need to use?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
css.raghu
Participant
Posts: 18
Joined: Thu Jan 28, 2010 9:34 pm

Which partition i need to use?

Post by css.raghu »

I am not getting the target data in sort manner.
I have tried with all types of paritions but no use.

Scenario is very simple as follows.

ROW GENERATOR ----->SORT STAGE------->Data Set.

source has only one column data type is integer.


Source: Row Generator
COLUMN1
0
1
2
3
4
5
6
7
8
9

Target
1
2
3
4
5
0
6
7
8
9
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

These are being sorted correctly, on two nodes. You will notice two sorted sub-lists. If you need to sort across the entire data, run the whole thing on one node or in sequential mode.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
css.raghu
Participant
Posts: 18
Joined: Thu Jan 28, 2010 9:34 pm

Post by css.raghu »

Yes,
we can achieve it by setting to sequential or running in single node.

My job is two node configuration.

can you please let me know is it possible with any partition settings?


I feel we can,but do not know how. by using Entire partition i am able to get but data is repeating twice.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Entire partition and remove duplicate. You will be doing twice the work and then some to negate the double work. Follow Ray's suggestion.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
css.raghu
Participant
Posts: 18
Joined: Thu Jan 28, 2010 9:34 pm

Post by css.raghu »

i just want to know is it possible or not, by using the partition settings.
Except Entire.
If not possible, please confirm the same.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The first half in one partition, the second in the other, just by 'partition settings'? No.
-craig

"You can never have too many knives" -- Logan Nine Fingers
css.raghu
Participant
Posts: 18
Joined: Thu Jan 28, 2010 9:34 pm

Post by css.raghu »

In that case we are losing the power of partitioning, Right?
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

What is your desired outcome?

As Ray has mentioned, the data IS sorted correctly. You are running a two node configuration, therefore your dataset by default contains two partitions. The Row Generator stage is running sequential by default, but the data is partitioned going into the sort stage, most likely Hash on the sort key. The data is then sorted WITHIN the partitions, not across.

When you view a partitioned dataset, you will typically see a block of records from one partition, then a block from another partition, and so on. The view (and peek in a DS job if running parallel) will not mingle the records together (they have no concept of how the data is supposed to be ordered). This is why your output appears as it does...it is showing you the records in one partition then the records in the other partition.

If you strictly are wanting to see the data in order--0 1 2 3 4 5 6 7 8 9--either run the dataset in sequential mode or write to a sequential file, in either case using a sort collection on the input link. If you will be performing other logic behind this, rest assured that the data IS sorted correctly within each partition (you might think of each partition as an independent stream).

If you're concerned with exactly which rows go to which partition, read up on the various partitioning options and how to implement them in the IS manuals. You'll likely find that, for just viewing the data, it's not worth the extra effort that some of the partition types would require.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For large volumes of data you can use two adjacent Sort stages, one that sorts in parallel and the other which executes sequentially, using a Sort Merge collector and does not otherwise sort at all.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply