Hash partion

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
subrat
Premium Member
Premium Member
Posts: 77
Joined: Tue Dec 11, 2007 5:54 am
Location: UK

Hash partion

Post by subrat »

Hi

Can anyone help me to understand how Hash partioning is happenening in parallel Job. How the key is generating?

Subrat
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You identify the key column(s). The characters from those columns are used as input to a hashing function, which always returns the same uint32 value (the "hashvalue") for any given set of characters.

This "hashvalue" is divided by the number of partitions and the remainder is the partition number to which that row is allocated.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
subrat
Premium Member
Premium Member
Posts: 77
Joined: Tue Dec 11, 2007 5:54 am
Location: UK

Post by subrat »

Could u please briefly explain whats mean by 'which always returns the same uint32 value'.
Moreover can a processing node contain more then one partion?



ray.wurlod wrote:You identify the key column(s). The characters from those columns are used as input to a hashing function, which always returns the same uint32 value (the "hashvalue") for any given set of characters ...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I don't believe that U's technical expertise is up to providing the explanation you seek so I will attempt your edification.

Code: Select all

hashvalue = f(keyvalue) 
hashvalue is the uint32 result
f() is the partitioning algorithm

Code: Select all

partition_number_for_row = Mod(hashvalue, partition_count)
There is a maximum of one partition per processing node. Use of node pools may mean that there are fewer partitions than processing nodes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
subrat
Premium Member
Premium Member
Posts: 77
Joined: Tue Dec 11, 2007 5:54 am
Location: UK

Post by subrat »

Thanks Ray for this valuable info...

Can u please also suggest me if we are doing hash partioning, is it always better to take all table keys as partion keys also? If yes then can i do same thing for other type of partion as well...

Moreover in case of join, lookup etc... are the data match within the partion or across partions also.

Thanks
Subrat
ray.wurlod wrote:I don't believe that U's technical expertise is up to providing the explanation you seek so I will attempt your edification.

Code: Select all

hashvalue = f(keyvalue) 
hashvalue is the uint32 r ...
Post Reply