Join stage output quantity mismatch query
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 43
- Joined: Mon Jan 15, 2007 10:53 pm
Join stage output quantity mismatch query
I have a job wherein I am joining two datasets using a join stage and join type as left outer.
I have sorted and partitioned both datasets on the join key.
However, still I am not getting the desired output.
I have around 6670 rows as input to the join stage, but as output, I get only 2099. I thought that 6670 remains the minimum possible when performing left-outer join.
Any suggestions on what I should do?
I have sorted and partitioned both datasets on the join key.
However, still I am not getting the desired output.
I have around 6670 rows as input to the join stage, but as output, I get only 2099. I thought that 6670 remains the minimum possible when performing left-outer join.
Any suggestions on what I should do?
Regards,
Vivek D. Reddy
__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
Vivek D. Reddy
__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
Re: Join stage output quantity mismatch query
Though you have mentioned LEFT OUTER, make sure the files are Link Ordered properly [Left and Right]vivekreddy wrote:I have a job wherein I am joining two datasets using a join stage and join type as left outer.
I have sorted and partitioned both datasets on the join key.
However, still I am not getting the desired output.
I have around 6670 rows as input to the join stage, but as output, I get only 2099. I thought that 6670 remains the minimum possible when performing left-outer join.
Any suggestions on what I should do?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 43
- Joined: Mon Jan 15, 2007 10:53 pm
-
- Participant
- Posts: 43
- Joined: Mon Jan 15, 2007 10:53 pm
Entire wont be a prescribed partition for Join stage. But still this will increase the number of resultant rows and not decrease.
Now explain more on what are the keys, and what is the partition that used on which stage and especially on the join stage, for both the input.
Basically need more details on job design.
Now explain more on what are the keys, and what is the partition that used on which stage and especially on the join stage, for both the input.
Basically need more details on job design.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Participant
- Posts: 43
- Joined: Mon Jan 15, 2007 10:53 pm
The key is a character field of length 2. In one dataset, partitioning method is Auto, whereas from the other, the left link, it is entire.
Regards,
Vivek D. Reddy
__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
Vivek D. Reddy
__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
-
- Participant
- Posts: 43
- Joined: Mon Jan 15, 2007 10:53 pm
I would suggest, do a hash partition on the Key, well before join, i.e,. the stages where you sort the data. And use same partition till Join stage.
Check if by any chance, you have any unique sort option enabled, and it removes duplicates.
Check if by any chance, you have any unique sort option enabled, and it removes duplicates.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'