Page 1 of 1

Inner Join Issues

Posted: Wed Feb 13, 2008 6:19 am
by abhilashnair
I am performing an inner join of two tables. The left table contains only 216 records while the right table contains 5 million records.
However, I am getting output as 1896 records in the output.
We should ideally get only 216 records because thats what is there in the left table. Why is this happening?

Posted: Wed Feb 13, 2008 7:09 am
by Maveric
Could be because of duplicates in the right link. For a record in the left link, if there are multiple matches in the right link, then the output will be multiple records. Remove duplicates on the right link.

Posted: Wed Feb 13, 2008 8:31 am
by chulett
:? Didn't we just go thru this recently? Nothing about doing an inner join ensures you only get out as many rows as exist on one side of the join. It's all about the keys and data in both tables. It is about multiple matches on the 'right' table in your case.

Posted: Wed Feb 13, 2008 8:58 am
by abhilashnair
Two things I left out to mention:
The right table which has 5 million records has no duplicates for the key combination which I mentioned in the join.
Secondly, I fired the same inner join query in db2 which returned exactly 216 records.
So when we do a inner join on these tables via a SQL query you get 216 rows .
You create a PX job and u get 1896 records..this beats me

Posted: Wed Feb 13, 2008 9:47 am
by kcbland
Please describe your design: Are you talking about User-SQL or a Join stage? In your Join stage, how are you matching the keys (could leading spaces or zeroes be the difference)?

Posted: Wed Feb 13, 2008 10:02 am
by chulett
Which means you are not performing the same 'join' in your job, and why Ken is asking for details of your job design.

Posted: Thu Feb 14, 2008 12:29 am
by abhilashnair
I am using Join stage. there are two inputs to the join stage. Both inputs are ODBC stages. Both have a simple select statement with order by.
All the primary keys have been added to join stage.
There are no duplicates in the right table. In the left table there are duplicates but total number of rows is 216. For every row in left table there is one and only one match in the right table.

Posted: Thu Feb 14, 2008 12:49 am
by ray.wurlod
What partitioning are you using on the inputs to your Join stage?

Posted: Thu Feb 14, 2008 1:00 am
by abhilashnair
I tried hash as well as round robin..Both are yielding same result. I have also set disable sort insertion as true. The join stage is perfroming the sort

Posted: Thu Feb 14, 2008 4:56 am
by ray.wurlod
Try using explicit Sort stages on the inputs to the Join stage, even if they have Sort Mode set to "Don't sort (previously sorted)".

Posted: Fri Feb 15, 2008 1:21 am
by abhilashnair
I removed duplicates from the left table which had 216 records...Now the output of join is as expected i.e equal to number of records in left table.
But it still is a puzzle that when there were 216 records why the output had 1896 records. Say even if there are duplicates you should get only those number which you have in left table
?

Posted: Mon Feb 18, 2008 4:28 pm
by sajarman
Hey, i have a question here...... For an inner join, does it really matter if one is a left link and the other right?

Posted: Mon Feb 18, 2008 4:37 pm
by chulett
By definition, for an inner join one is left and one is right but it doesn't matter which is which.

Posted: Thu Feb 21, 2008 3:55 pm
by sajarman
Thanks Craig!