Page 1 of 1

Auto Partition method in Sort Stage

Posted: Fri Apr 24, 2009 9:38 am
by NigeGriff
When using the auto partition method in a sort stage, if i set the partition key to be column 1 but sort on columns 1 and column 2, the incoming data to the sort stage is partitioned on columns 1 and 2 rather than column 1 only.

My understanding was that auto would partition data based on the partition key specified on the stage input tab.

Posted: Fri Apr 24, 2009 3:22 pm
by ray.wurlod
If partitioning is (Auto) you don't get to set the partitioning key at all.

(Auto) on inputs to a Join stage will use all the sort keys as the partitioning key and Hash as the partitioning algorithm.

Auto is default partiiton method

Posted: Sat Apr 25, 2009 2:01 am
by sureshreddy2009
:lol: hi
auto is a default partiioning method
when we select auto it dont allow to select key for sort at link level or unique etc, conclusion is what ever we had given in keys place in sort stage then it sorts based on that key or keys, here one doubt when we select column1 as sort key and in partitioning place if we select partition other than auto like hash, modulus etc, and partition keys are column1, column2 then it performs sort on column1 or it aborts

Posted: Mon Apr 27, 2009 2:32 am
by NigeGriff
Thanks for the quick reply Ray.

In summary i think what this means is that using auto to decide the partitioning keys and method for a sort stage is OK except for the situation were the next stage after the sort is one such as a drop duplicates that needs to drop records based on column1.

Posted: Mon Apr 27, 2009 2:40 am
by ray.wurlod
You could always add a second Sort stage sorting just on column1 in mode "don't sort, already sorted" and have that Sort stage generate a Cluster Key Change column.

Posted: Mon Apr 27, 2009 4:41 am
by NigeGriff
Thanks for the quick reply Ray.

In summary i think what this means is that using auto to decide the partitioning keys and method for a sort stage is OK except for the situation were the next stage after the sort is one such as a drop duplicates that needs to drop records based on column1.