Partitioning & Sort

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

Partitioning & Sort

Post by highpoint »

I have 3 questions:

Question 1:

I have a transformer for which i am doing link sort and hash partitioning to perform some logic

The keys are:
Product (partitioned & Sorted)
minqty (sorted)
maxqty (sorted)
mincost (sorted)
maxcost (sorted)

Now the output of this goes to 2 streams one with remove duplicate stage and the other with join stage

Remove duplicate:

I would like to remove duplicate on product,minqty,maxqty,mincost,maxcost.

Do i have to do link sort and hash partitioning on all the above keys or i can use the same partitioning as all the same product will be on same partition.

Join stage:

Will the same case apply for the join stage as well.



Question 2:

For a transformer stage when we do the link sort.
we have an option in transformer properties to preserve output sort order.

Does the other stages like remove dup, join, aggregator, filter will maintain sort order in their output partitions or no.


Question 3:

I see an option of stable sort in transformer. I read documentation but did not made complete sense to me.

Would appreciate if someone could explain where the stable sort can be used.
veera24
Premium Member
Premium Member
Posts: 150
Joined: Thu Feb 07, 2008 9:37 pm
Location: NewYork

Post by veera24 »

1. Remove dup stage will work with Same partition if you intended to remove the duplicates based on the key fields you mentioned in link sort.
similarly for Join stage also...
2. why don't open the stages and verfiy it?
3. Stable sort - if you want to preserve previously sorted data sets you can go for this..

Please do not combine all your questions in a place it will be mess then..
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

Post by highpoint »

veera24 wrote:2. why don't open the stages and verfiy it?
I opened stages and i didn't find any option to preserve sort order in the output of these stages.

So, only i wanted to know if it is default for these stages to give sorted data on each partition as provided in the input link per partition.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Partitioning & Sort

Post by SURA »

Code: Select all

Preserve partitioning. This is Propagate by default. If you have an input data set, it adopts Set or Clear from the previous stage. You can explicitly select Set or Clear. Select Set to request the next stage should attempt to maintain the partitioning. 
If you set the preserver partition to clear, the coming stage will not have the same parttion method what you had earlier is my understanding.

DS User
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

Re: Partitioning & Sort

Post by highpoint »

I am clear about the partitioning. My question was more towards the sort on each partition.

Will these stages maintain same sort order in output as input per partition.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Partitioning & Sort

Post by SURA »

I am thinking about HASH partition and link sort. What will happen to the output stage if the preserve partition marked as clear?

Once the HASH partition is selected then the sort option will be selected. If this is the case, Partition is made as clear, then the sort may not work / valid.

Whereas if you choose the sort stage then it wont mind what partition it is going to be and it will do the work, but you may get the wrong result.

If i am wrong , please correct.

DS User
prakashdasika
Premium Member
Premium Member
Posts: 72
Joined: Mon Jul 06, 2009 9:34 pm
Location: Sydney

Post by prakashdasika »

For most of the stages that have a default parallel operator it should maintain sort unless modified. However , the data loses sort when it encounters stages like aggregator.

The data also loses sort when using a sequential opertor is run between two parallel operators.
Prakash Dasika
ETL Consultant
Sydney
Australia
Post Reply