Best partitioning method

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jpraveen
Participant
Posts: 71
Joined: Sat Jun 06, 2009 7:10 am
Location: HYD

Best partitioning method

Post by jpraveen »

Hi All,

my requirement is ,i am using join stage and also look-up stage s in my jobs,what is the best partitioning method for Look-up stage as well as Join stage . i am running on 2-node config file.
by default i am using auto-partition
though we give this as Auto -partitioning,what type of partitioning will take the datastage internally?
Jaypee
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Re: Best partitioning method

Post by samyamkrishna »

Use any key based partitioning method.
jpraveen
Participant
Posts: 71
Joined: Sat Jun 06, 2009 7:10 am
Location: HYD

Post by jpraveen »

HI

what kind of key based partitioning method we should use,
i tried with Entire Partitioning,but the records are doubled,and also i suspect we cannot use round-robin,because some records will go into 1st node and some will go into 2nd node,
can anyone explain on this ?
Jaypee
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Post by samyamkrishna »

use hash partitioning and use the the keys which you are using in join for the partitioning.
Ravi.K
Participant
Posts: 209
Joined: Sat Nov 20, 2010 11:33 pm
Location: Bangalore

Post by Ravi.K »

Entire is non Key partition method.

If you have single key column and that is Integer then use Modulus otherwise Hash Partition.

Find the Key and Non Key Partition methods.

Key:
-----
Hash
Modulus
Range
DB2

Non Key:
---------
Round Robin
Random
Range
Entire.
Cheers
Ravi K
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's too simplistic. You not only have to use a key-based algorithm, but you must also make sure that the partitioning is based on (at least the first) join key and that the algorithm will yield sufficient distinct values to spread data across all the available nodes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply