Page 1 of 1

Inner join is producing more rows than input

Posted: Wed Jan 23, 2008 5:09 pm
by abc123
The 2 input sources have 10 rows each. The output is much greater than 10. I am sorting and hash partitioning on the key column on the input of the join. How can it be?

Posted: Wed Jan 23, 2008 6:06 pm
by chulett
What does an inner join have to do with how many rows come out? Never heard of a cartesian product? Depends on the nature of the join keys and the data being joined.

Posted: Wed Jan 23, 2008 7:57 pm
by ray.wurlod
Worst case - every row has the same key value. Every row from the left input will therefore match every row from the right output. You'll get 100 rows output.

Posted: Thu Jan 24, 2008 11:18 am
by abc123
Yes. If my 2 input links have 100 rows each, the maximum I can get is 100 rows on an inner join, right? I can never get more than 100 but I am getting more. I am joining on the key.

Posted: Thu Jan 24, 2008 12:23 pm
by gateleys
abc123 wrote:Yes. If my 2 input links have 100 rows each, the maximum I can get is 100 rows on an inner join, right? I can never get more than 100 but I am getting more. I am joining on the key.
You still don't get it, do you?

With 100 input rows on each, you can get 100 * 100 = 10,000 rows at the most even with an inner join. This happens when the join happens to do a cartesian product since all the key values in both tables are the same. So, the number of output rows really depends on the keys and values.

Posted: Sun Jan 27, 2008 12:47 pm
by fridge
another way this can happen is if one of your keys contains nulls - ive seen this before and it produces oodles of output