Page 1 of 1

Regarding JOIN Stage

Posted: Wed Jun 08, 2011 9:14 am
by krishna14
All,

I got stuck with small issue on JOIN Stage,below is the secnario which i am trying to work on :

TABLE 1: DATA_KEY FIRST_DATA
1 A
2 B
3 C
2 D
4 E
TABLE 2: DATA_KEY LAST_DATA
1 v
2 w
3 x
2 y
4 z
DESIRED OUTPUT: DATA_KEY FIRST_DATA LAST_DATA
1 A V
2 B W
3 C X
2 D Y
4 E Z
For some reason i am not getting the desired output when i use the inner join (KeyColumn: DATA_KEY),my output which i am getting has lesser no: rows ...

Please guide me how can i over come the issue ....?

Re: Regarding JOIN Stage

Posted: Wed Jun 08, 2011 10:56 am
by blewip
It's not entirely clear from your example, however the 1,2,3,2 appear to be the key. Both sides of the join need to be sorted on the key.

In which case they should appear 1,2,2,3

Duplicates in the key will duplicate the output, there should be 4 output records of key 2.

Re: Regarding JOIN Stage

Posted: Wed Jun 08, 2011 11:53 am
by soumya5891
You need to make a hash partition on each input link in the same order.The order of the hash keys will be same as join keys

Re: Regarding JOIN Stage

Posted: Wed Jun 08, 2011 12:57 pm
by mobashshar
soumya5891 wrote:You need to make a hash partition on each input link in the same order.The order of the hash keys will be same as join keys
Soumya is right. Just hash partition and also sort your input key column

Posted: Wed Jun 08, 2011 9:26 pm
by krishna14
Thanks , i did work what soumya suggested ....i got the desired output..Thanks all .i apperciate your response soumya.

Re: Regarding JOIN Stage

Posted: Thu Jun 09, 2011 2:02 am
by blewip
blewip wrote:Both sides of the join need to be sorted on the key.
Thanks Soumya

So that's how you do a Sort