I have a doubt in partitioning. In my job i'm using a sort -> remove duplicate->join.
Here sort and RD is based on 3 keys (say key1, key2 and key 3) and join is based on 2 keys (key1 and key2).. so in this case should i need to re-partition(hash) in join stage based on these 2 keys?
Thes two join keys are already has partioned in sort stage.
Regarding partioning
Moderators: chulett, rschirm, roy
Regarding partioning
Thanks,
Sajeev N
Sajeev N
If your data is already partitioned on key1 and key2 prior to the sort/rd, there is no need to repartition for the join stage (the partitioning already meets the requirements for all logic. RD will not affect the existing partitioning...it is only removing records). If your data is partitioned on key1, key2 and key3 prior to the sort/rd, either remove key3 from the partition strategy (preferred) or repartition/resort prior to the join.
Regards,
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
Re: Regarding partioning
I go with Jwiles.