Inner join is producing more rows than input

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Inner join is producing more rows than input

Post by abc123 »

The 2 input sources have 10 rows each. The output is much greater than 10. I am sorting and hash partitioning on the key column on the input of the join. How can it be?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What does an inner join have to do with how many rows come out? Never heard of a cartesian product? Depends on the nature of the join keys and the data being joined.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Worst case - every row has the same key value. Every row from the left input will therefore match every row from the right output. You'll get 100 rows output.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

Yes. If my 2 input links have 100 rows each, the maximum I can get is 100 rows on an inner join, right? I can never get more than 100 but I am getting more. I am joining on the key.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

abc123 wrote:Yes. If my 2 input links have 100 rows each, the maximum I can get is 100 rows on an inner join, right? I can never get more than 100 but I am getting more. I am joining on the key.
You still don't get it, do you?

With 100 input rows on each, you can get 100 * 100 = 10,000 rows at the most even with an inner join. This happens when the join happens to do a cartesian product since all the key values in both tables are the same. So, the number of output rows really depends on the keys and values.
gateleys
fridge
Premium Member
Premium Member
Posts: 136
Joined: Sat Jan 10, 2004 8:51 am

Post by fridge »

another way this can happen is if one of your keys contains nulls - ive seen this before and it produces oodles of output
Post Reply