DSXchange

Posted: **Thu Feb 08, 2007 7:31 am**

I have a job wherein I am joining two datasets using a join stage and join type as left outer.
I have sorted and partitioned both datasets on the join key.
However, still I am not getting the desired output.

I have around 6670 rows as input to the join stage, but as output, I get only 2099. I thought that 6670 remains the minimum possible when performing left-outer join.

Any suggestions on what I should do?

Posted: **Thu Feb 08, 2007 9:10 am**

vivekreddy wrote:I have a job wherein I am joining two datasets using a join stage and join type as left outer.
I have sorted and partitioned both datasets on the join key.
However, still I am not getting the desired output.

I have around 6670 rows as input to the join stage, but as output, I get only 2099. I thought that 6670 remains the minimum possible when performing left-outer join.

Any suggestions on what I should do?

Though you have mentioned LEFT OUTER, make sure the files are Link Ordered properly [Left and Right]

Posted: **Thu Feb 08, 2007 3:41 pm**

It may also be possible that your data are not partitioned based upon the join key (in addition to any other potential cause). Check that also.

Posted: **Thu Feb 08, 2007 11:28 pm**

All done, still not working

Posted: **Thu Feb 08, 2007 11:32 pm**

What type of partition is used?
Just be aware that, 6670 will be output, only if its a Outer Join on the dataset which has 6670 number of records.

Posted: **Thu Feb 08, 2007 11:58 pm**

Entire

Posted: **Fri Feb 09, 2007 12:40 am**

Entire wont be a prescribed partition for Join stage. But still this will increase the number of resultant rows and not decrease.
Now explain more on what are the keys, and what is the partition that used on which stage and especially on the join stage, for both the input.
Basically need more details on job design.

Posted: **Fri Feb 09, 2007 12:51 am**

The key is a character field of length 2. In one dataset, partitioning method is Auto, whereas from the other, the left link, it is entire.

Posted: **Fri Feb 09, 2007 12:53 am**

The partitioning method on the Join stage is Entire.

Posted: **Fri Feb 09, 2007 2:30 am**

I would suggest, do a hash partition on the Key, well before join, i.e,. the stages where you sort the data. And use same partition till Join stage.
Check if by any chance, you have any unique sort option enabled, and it removes duplicates.

DSXchange

Join stage output quantity mismatch query

Join stage output quantity mismatch query

Re: Join stage output quantity mismatch query