Hashing Issue

sgubba · Post by **sgubba** » Fri Feb 13, 2009 10:14 am

Hi Everybody

I am having a wired Issue. In my job I am using close to 5 joins
basically i have a driver tables and i am joining to another table on key1 before joining i am hashing on key 1 on both the links .now from the joined table i get key2 using this key2 i join to another table based on key2.Before i am joining i am hashing on key2 ,But every time i run it i see different out put counts ......on all the remaining joins i am using partitioning same........can any one tell me why i am getting different count

Thanks

Gopinath · Post by **Gopinath** » Fri Feb 13, 2009 12:22 pm

Hi,

You have to clear the previous partitioning before giving it to a new partition. In the second join which is on key2, you should clear the previous partiton which is out of key1. Also the next consecutive joins is also based on key2 alone, if so then use same partition else clear in 3rd join and give appropriate keys in partition.

Thanks.

sgubba · Post by **sgubba** » Fri Feb 13, 2009 4:26 pm

Gopinath wrote:Hi,

You have to clear the previous partitioning before giving it to a new partition. In the second join which is on key2, you should clear the previous partiton which is out of key1. Also the next consecutive joins is also based on key2 alone, if so then use same partition else clear in 3rd join and give appropriate keys in partition.

Thanks.

Yep I am doing that ..still i have any issue ..

Mike · Post by **Mike** » Fri Feb 13, 2009 6:06 pm

Join requires sorted inputs. You haven't mentioned whether you are sorting... hash partition by key and sort.

Mike

sgubba · Post by **sgubba** » Fri Feb 13, 2009 7:25 pm

Mike wrote:Join requires sorted inputs. You haven't mentioned whether you are sorting... hash partition by key and sort.

Mike

Nope i am not sorting them ....i didnt do it coz i thought it would add an over head ....Is it a hard and fast rule that we need to hash and sort before join

Thanks

vjonnala1516 · Post by **vjonnala1516** » Fri Feb 13, 2009 11:04 pm

Join: Sort and hashed the data on the join keys, if you are using the same key for the next level of joining then use SAME else clear the partion and resort the data on new join keys

ray.wurlod · Post by **ray.wurlod** » Sat Feb 14, 2009 2:10 am

Join stage (and certain others) mandatorily require that inputs be identically sorted and partitioned, in the case of the Join stage on the specified join keys. Usually this implies the Hash partitioning algorithm but, for a single integer key, the modulus partitioning algorithm may prove more efficient.