Inner Join Issues

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Inner Join Issues

Post by abhilashnair »

I am performing an inner join of two tables. The left table contains only 216 records while the right table contains 5 million records.
However, I am getting output as 1896 records in the output.
We should ideally get only 216 records because thats what is there in the left table. Why is this happening?
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

Could be because of duplicates in the right link. For a record in the left link, if there are multiple matches in the right link, then the output will be multiple records. Remove duplicates on the right link.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:? Didn't we just go thru this recently? Nothing about doing an inner join ensures you only get out as many rows as exist on one side of the join. It's all about the keys and data in both tables. It is about multiple matches on the 'right' table in your case.
Last edited by chulett on Wed Feb 13, 2008 9:25 am, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

Two things I left out to mention:
The right table which has 5 million records has no duplicates for the key combination which I mentioned in the join.
Secondly, I fired the same inner join query in db2 which returned exactly 216 records.
So when we do a inner join on these tables via a SQL query you get 216 rows .
You create a PX job and u get 1896 records..this beats me
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Please describe your design: Are you talking about User-SQL or a Join stage? In your Join stage, how are you matching the keys (could leading spaces or zeroes be the difference)?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Which means you are not performing the same 'join' in your job, and why Ken is asking for details of your job design.
-craig

"You can never have too many knives" -- Logan Nine Fingers
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

I am using Join stage. there are two inputs to the join stage. Both inputs are ODBC stages. Both have a simple select statement with order by.
All the primary keys have been added to join stage.
There are no duplicates in the right table. In the left table there are duplicates but total number of rows is 216. For every row in left table there is one and only one match in the right table.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What partitioning are you using on the inputs to your Join stage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

I tried hash as well as round robin..Both are yielding same result. I have also set disable sort insertion as true. The join stage is perfroming the sort
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try using explicit Sort stages on the inputs to the Join stage, even if they have Sort Mode set to "Don't sort (previously sorted)".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

I removed duplicates from the left table which had 216 records...Now the output of join is as expected i.e equal to number of records in left table.
But it still is a puzzle that when there were 216 records why the output had 1896 records. Say even if there are duplicates you should get only those number which you have in left table
?
sajarman
Participant
Posts: 41
Joined: Mon Nov 28, 2005 6:29 am

Post by sajarman »

Hey, i have a question here...... For an inner join, does it really matter if one is a left link and the other right?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

By definition, for an inner join one is left and one is right but it doesn't matter which is which.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sajarman
Participant
Posts: 41
Joined: Mon Nov 28, 2005 6:29 am

Post by sajarman »

Thanks Craig!
Post Reply