Inner Join Issues

abhilashnair · Post by **abhilashnair** » Wed Feb 13, 2008 6:19 am

I am performing an inner join of two tables. The left table contains only 216 records while the right table contains 5 million records.
However, I am getting output as 1896 records in the output.
We should ideally get only 216 records because thats what is there in the left table. Why is this happening?

Maveric · Post by **Maveric** » Wed Feb 13, 2008 7:09 am

Could be because of duplicates in the right link. For a record in the left link, if there are multiple matches in the right link, then the output will be multiple records. Remove duplicates on the right link.

chulett · Post by **chulett** » Wed Feb 13, 2008 8:31 am

Didn't we just go thru this recently? Nothing about doing an inner join ensures you only get out as many rows as exist on one side of the join. It's all about the keys and data in both tables. It is about multiple matches on the 'right' table in your case.

abhilashnair · Post by **abhilashnair** » Wed Feb 13, 2008 8:58 am

Two things I left out to mention:
The right table which has 5 million records has no duplicates for the key combination which I mentioned in the join.
Secondly, I fired the same inner join query in db2 which returned exactly 216 records.
So when we do a inner join on these tables via a SQL query you get 216 rows .
You create a PX job and u get 1896 records..this beats me

kcbland · Post by **kcbland** » Wed Feb 13, 2008 9:47 am

Please describe your design: Are you talking about User-SQL or a Join stage? In your Join stage, how are you matching the keys (could leading spaces or zeroes be the difference)?

chulett · Post by **chulett** » Wed Feb 13, 2008 10:02 am

Which means you are not performing the same 'join' in your job, and why Ken is asking for details of your job design.

abhilashnair · Post by **abhilashnair** » Thu Feb 14, 2008 12:29 am

I am using Join stage. there are two inputs to the join stage. Both inputs are ODBC stages. Both have a simple select statement with order by.
All the primary keys have been added to join stage.
There are no duplicates in the right table. In the left table there are duplicates but total number of rows is 216. For every row in left table there is one and only one match in the right table.

ray.wurlod · Post by **ray.wurlod** » Thu Feb 14, 2008 12:49 am

What partitioning are you using on the inputs to your Join stage?

abhilashnair · Post by **abhilashnair** » Thu Feb 14, 2008 1:00 am

I tried hash as well as round robin..Both are yielding same result. I have also set disable sort insertion as true. The join stage is perfroming the sort

ray.wurlod · Post by **ray.wurlod** » Thu Feb 14, 2008 4:56 am

Try using explicit Sort stages on the inputs to the Join stage, even if they have Sort Mode set to "Don't sort (previously sorted)".

abhilashnair · Post by **abhilashnair** » Fri Feb 15, 2008 1:21 am

I removed duplicates from the left table which had 216 records...Now the output of join is as expected i.e equal to number of records in left table.
But it still is a puzzle that when there were 216 records why the output had 1896 records. Say even if there are duplicates you should get only those number which you have in left table
?

sajarman · Post by **sajarman** » Mon Feb 18, 2008 4:28 pm

Hey, i have a question here...... For an inner join, does it really matter if one is a left link and the other right?

chulett · Post by **chulett** » Mon Feb 18, 2008 4:37 pm

By definition, for an inner join one is left and one is right but it doesn't matter which is which.

sajarman · Post by **sajarman** » Thu Feb 21, 2008 3:55 pm

Thanks Craig!