All,
I got stuck with small issue on JOIN Stage,below is the secnario which i am trying to work on :
TABLE 1: DATA_KEY FIRST_DATA
1 A
2 B
3 C
2 D
4 E
TABLE 2: DATA_KEY LAST_DATA
1 v
2 w
3 x
2 y
4 z
DESIRED OUTPUT: DATA_KEY FIRST_DATA LAST_DATA
1 A V
2 B W
3 C X
2 D Y
4 E Z
For some reason i am not getting the desired output when i use the inner join (KeyColumn: DATA_KEY),my output which i am getting has lesser no: rows ...
Please guide me how can i over come the issue ....?
Regarding JOIN Stage
Moderators: chulett, rschirm, roy
Re: Regarding JOIN Stage
It's not entirely clear from your example, however the 1,2,3,2 appear to be the key. Both sides of the join need to be sorted on the key.
In which case they should appear 1,2,2,3
Duplicates in the key will duplicate the output, there should be 4 output records of key 2.
In which case they should appear 1,2,2,3
Duplicates in the key will duplicate the output, there should be 4 output records of key 2.
Modern Life is Rubbish - Blur
-
- Participant
- Posts: 152
- Joined: Mon Mar 07, 2011 6:16 am
Re: Regarding JOIN Stage
You need to make a hash partition on each input link in the same order.The order of the hash keys will be same as join keys
Soumya
-
- Participant
- Posts: 91
- Joined: Wed Apr 20, 2005 7:59 pm
- Location: U.S.
Re: Regarding JOIN Stage
Soumya is right. Just hash partition and also sort your input key columnsoumya5891 wrote:You need to make a hash partition on each input link in the same order.The order of the hash keys will be same as join keys
Re: Regarding JOIN Stage
Thanks Soumyablewip wrote:Both sides of the join need to be sorted on the key.
So that's how you do a Sort
Modern Life is Rubbish - Blur