Hash on Sort Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
yxie
Premium Member
Premium Member
Posts: 10
Joined: Tue Apr 03, 2007 3:46 pm

Hash on Sort Stage

Post by yxie »

Hi folks,

I have a job starting from two datasets, which are going to be joined based on two key, SEQ and DATE. Essentially I hash and sort them on sort stage before moving to join stage.
My doubt is since I already define SEQ and DATE as keys from both dataset, I am not sure if they are already been partitioned based on two keys from input datasets, can I ignore the hash?
Secondly, after above join stage, we have one more join stage only base on key SEQ, two input links one from previous join stage, other from the one of the original datasets ( I have a sort and a copy stage before inputing to each join), my question regarding this is do I have to hashed and sort each link on SEQ again before join(I guess that those two links are sorted and partitioned on SEQ and DATE).
Just wish to avoid repatition and understand theory better.

Appreciate any of your thoughts

Thanks in advance!

YXie
mikegohl
Premium Member
Premium Member
Posts: 97
Joined: Fri Jun 13, 2003 12:50 pm
Location: Chicago
Contact:

Post by mikegohl »

Do you know what the partition and join keys are when the datasets were written? You can partion all Datasets by Seq from the start. This will avoid to repatition before the second join. You can still sort the data on Seq and Date.
Michael Gohl
Post Reply