Partitioning & Sort

highpoint · Post by **highpoint** » Wed Oct 12, 2011 9:46 am

I have 3 questions:

Question 1:

I have a transformer for which i am doing link sort and hash partitioning to perform some logic

The keys are:
Product (partitioned & Sorted)
minqty (sorted)
maxqty (sorted)
mincost (sorted)
maxcost (sorted)

Now the output of this goes to 2 streams one with remove duplicate stage and the other with join stage

Remove duplicate:

I would like to remove duplicate on product,minqty,maxqty,mincost,maxcost.

Do i have to do link sort and hash partitioning on all the above keys or i can use the same partitioning as all the same product will be on same partition.

Join stage:

Will the same case apply for the join stage as well.

Question 2:

For a transformer stage when we do the link sort.
we have an option in transformer properties to preserve output sort order.

Does the other stages like remove dup, join, aggregator, filter will maintain sort order in their output partitions or no.

Question 3:

I see an option of stable sort in transformer. I read documentation but did not made complete sense to me.

Would appreciate if someone could explain where the stable sort can be used.

veera24 · Post by **veera24** » Wed Oct 12, 2011 12:05 pm

1. Remove dup stage will work with Same partition if you intended to remove the duplicates based on the key fields you mentioned in link sort.
similarly for Join stage also...
2. why don't open the stages and verfiy it?
3. Stable sort - if you want to preserve previously sorted data sets you can go for this..

Please do not combine all your questions in a place it will be mess then..

highpoint · Post by **highpoint** » Wed Oct 12, 2011 5:48 pm

veera24 wrote:2. why don't open the stages and verfiy it?

I opened stages and i didn't find any option to preserve sort order in the output of these stages.

So, only i wanted to know if it is default for these stages to give sorted data on each partition as provided in the input link per partition.

SURA · Post by **SURA** » Wed Oct 12, 2011 7:21 pm

Code: Select all

Preserve partitioning. This is Propagate by default. If you have an input data set, it adopts Set or Clear from the previous stage. You can explicitly select Set or Clear. Select Set to request the next stage should attempt to maintain the partitioning.

If you set the preserver partition to clear, the coming stage will not have the same parttion method what you had earlier is my understanding.

DS User

highpoint · Post by **highpoint** » Wed Oct 12, 2011 9:41 pm

I am clear about the partitioning. My question was more towards the sort on each partition.

Will these stages maintain same sort order in output as input per partition.

SURA · Post by **SURA** » Wed Oct 12, 2011 10:18 pm

I am thinking about HASH partition and link sort. What will happen to the output stage if the preserve partition marked as clear?

Once the HASH partition is selected then the sort option will be selected. If this is the case, Partition is made as clear, then the sort may not work / valid.

Whereas if you choose the sort stage then it wont mind what partition it is going to be and it will do the work, but you may get the wrong result.

If i am wrong , please correct.

DS User

prakashdasika · Post by **prakashdasika** » Wed Oct 12, 2011 10:20 pm

For most of the stages that have a default parallel operator it should maintain sort unless modified. However , the data loses sort when it encounters stages like aggregator.

The data also loses sort when using a sequential opertor is run between two parallel operators.

DSXchange

Partitioning & Sort

Partitioning & Sort

Re: Partitioning & Sort

Re: Partitioning & Sort

Re: Partitioning & Sort