Partitioning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Bilwakunj
Participant
Posts: 59
Joined: Fri Sep 10, 2004 7:00 am

Partitioning

Post by Bilwakunj »

Hi ,
In my DDL , I've got 4 columns as the primary key of the columns and for all the join stage I'm using the "Hash" partitioning , is this the right approach or I should go for Auto. I've got the impression that when we say "Auto" datastage uses the "round robin" or "entire" partitioning internalley depending on the previos stages and the preserve partitioning flag, as per my requirement I shdn't be going for either of them so I'm using "Hash". Just wondering is this correct approach?
Thanks in advance.
GIDs
Participant
Posts: 16
Joined: Sun May 23, 2004 2:39 pm
Location: San Francisco

Post by GIDs »

Using HASH is better off... you are gaurenteed of perfect results. You have to sort the input on all input links (if not previously sorted) in the same order as your join key, but partition on one/two columns that you think would provide a good partitioning of your data and which will also group your data into distinct data sets.
Post Reply