Hashing Issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sgubba
Participant
Posts: 30
Joined: Wed Apr 16, 2008 11:06 am

Hashing Issue

Post by sgubba »

Hi Everybody

I am having a wired Issue. In my job I am using close to 5 joins
basically i have a driver tables and i am joining to another table on key1 before joining i am hashing on key 1 on both the links .now from the joined table i get key2 using this key2 i join to another table based on key2.Before i am joining i am hashing on key2 ,But every time i run it i see different out put counts ......on all the remaining joins i am using partitioning same........can any one tell me why i am getting different count


Thanks
Gopinath
Participant
Posts: 52
Joined: Wed Apr 25, 2007 2:18 am
Location: Chennai

Post by Gopinath »

Hi,

You have to clear the previous partitioning before giving it to a new partition. In the second join which is on key2, you should clear the previous partiton which is out of key1. Also the next consecutive joins is also based on key2 alone, if so then use same partition else clear in 3rd join and give appropriate keys in partition.

Thanks.
Gopinath
sgubba
Participant
Posts: 30
Joined: Wed Apr 16, 2008 11:06 am

Post by sgubba »

Gopinath wrote:Hi,

You have to clear the previous partitioning before giving it to a new partition. In the second join which is on key2, you should clear the previous partiton which is out of key1. Also the next consecutive joins is also based on key2 alone, if so then use same partition else clear in 3rd join and give appropriate keys in partition.

Thanks.
Yep I am doing that ..still i have any issue ..
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Join requires sorted inputs. You haven't mentioned whether you are sorting... hash partition by key and sort.

Mike
sgubba
Participant
Posts: 30
Joined: Wed Apr 16, 2008 11:06 am

Post by sgubba »

Mike wrote:Join requires sorted inputs. You haven't mentioned whether you are sorting... hash partition by key and sort.

Mike
Nope i am not sorting them ....i didnt do it coz i thought it would add an over head ....Is it a hard and fast rule that we need to hash and sort before join

Thanks
vjonnala1516
Participant
Posts: 18
Joined: Fri Jan 04, 2008 5:28 am
Location: Bangalore

Post by vjonnala1516 »

Join: Sort and hashed the data on the join keys, if you are using the same key for the next level of joining then use SAME else clear the partion and resort the data on new join keys
VJ
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Join stage (and certain others) mandatorily require that inputs be identically sorted and partitioned, in the case of the Join stage on the specified join keys. Usually this implies the Hash partitioning algorithm but, for a single integer key, the modulus partitioning algorithm may prove more efficient.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply