Join Problem

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Join Problem

Post by kashif007 »

I have joined two tables with an inner join. Table 1 has 5000 records and table 2 has 5000 records. The expected output cannot be more than 5000 records in this case. But I get way beyond 5000 records. It seems to me that the join is doing a cartesian product of both the tables. Can any one suggest me, how do I join two tables with out doing the cartesian product and getting the result equal to or less than 5000 records only. I don't understand why my join is working that way. Please suggest.
Regards
Kashif Khan
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Please describe your job design
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
swades
Premium Member
Premium Member
Posts: 323
Joined: Mon Dec 04, 2006 11:52 pm

Post by swades »

Try to Trim your data for Joining Keys.
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Post by kashif007 »

Table 1 has 1000 records and 4 columns, table 2 have 1000 records and 3 columns. One column is common between both the tables. I am doing an inner join on that one column. But the output I get is 10,000 records. Its sort of multilying the data and producing the output.

Following is the scenario:

Table1: (1000 Records)
Key Column2 Column3 Column4

Table2: (1000 Records)
Key Column5 column6

Output Table: (10,000 Records after the join)
Key Column2 Column3 Column4 Column5 Column6

Source is oracle database and target is oracle database. So three oracle enterprise stage 1 for target and 2 for source.

Please suggest. I just need the matching records from both the files at the output. Its actually giving me more than expected.
Regards
Kashif Khan
swades
Premium Member
Premium Member
Posts: 323
Joined: Mon Dec 04, 2006 11:52 pm

Post by swades »

kashif007 wrote: I am doing an inner join on that one column
What is Data Type for this column?
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

Here are some things you can check:

1. Check whether you are selecting the "Inner join" option correctly.
2. Any duplicates in the source or reference.
3. Any null values in the joining key.
4. Trim and sort the data before performing join on the key column.
5. Are you doing any partioning and re-partitioning on the data.
I haven't failed, I've found 10,000 ways that don't work.
Thomas Alva Edison(1847-1931)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Make very sure that the inputs are identically key-partitioned and sorted.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
MTA
Participant
Posts: 37
Joined: Thu Feb 02, 2006 2:25 pm

Re: Join Problem

Post by MTA »

kashif007 wrote:I have joined two tables with an inner join. Table 1 has 5000 records and table 2 has 5000 records. The expected output cannot be more than 5000 records in this case. But I get way beyond 5000 records. It seems to me that the join is doing a cartesian product of both the tables. Can any one suggest me, how do I join two tables with out doing the cartesian product and getting the result equal to or less than 5000 records only. I don't understand why my join is working that way. Please suggest.
2 things kshif.
Are you using an enterprise stage to have a sql join the two tables. Then you need the node number condition.
If you are joinging them out side th EE stage then you would need to sort them and partition them in the appropriate way before join to get proper results of the join
If you are not using the EE stage please make sure that the stage is running in sequential mode and not parallel mode. If you are running a DB stage which is not EE stage and running in Parallel then you will get multiplication of rows extracted.

Please let know if this solves your issue
M.T.Anwer
The day the child realizes that all adults are imperfect he becomes an adolescent;
the day he forgives them, he becomes an adult; the day he forgives himself, he becomes wise.
-Aiden Nowlan
Post Reply