Partition and sort

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Partition and sort

Post by pdntsap »

Hello,

We have a requirement where we need to sort on 10 keys, then remove duplicates based on the first 8 keys out of the 10 keys and then join based on the first 9 keys out of the 10 keys. We have two sorter stages and then a join stage but the partition method chosen seems to not give us the right output. I am looking for the partition and sorting approach that can be used in the above stages. We tried different options of partition but still confused.

Thanks.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Partition and sort

Post by kwwilliams »

Your partition requirement is not the same as your sort requirement. Choose 1 field that has high cardinality, and is used in both sorts and as a key on the join. The high cardinality will give you an even spread across your nodes - however any field which the two sorts and the join have in common will work.

On your second sort are you using don't sort already sorted for the fields already sorted in the first sort? This won't effect the data outcome but would be a huge performance improvement in your job if you are not already using this method.
Post Reply