Page 1 of 1

Join Problem

Posted: Tue Aug 07, 2007 12:04 pm
by kashif007
I have joined two tables with an inner join. Table 1 has 5000 records and table 2 has 5000 records. The expected output cannot be more than 5000 records in this case. But I get way beyond 5000 records. It seems to me that the join is doing a cartesian product of both the tables. Can any one suggest me, how do I join two tables with out doing the cartesian product and getting the result equal to or less than 5000 records only. I don't understand why my join is working that way. Please suggest.

Posted: Tue Aug 07, 2007 12:12 pm
by kcbland
Please describe your job design

Posted: Tue Aug 07, 2007 12:48 pm
by swades
Try to Trim your data for Joining Keys.

Posted: Tue Aug 07, 2007 1:30 pm
by kashif007
Table 1 has 1000 records and 4 columns, table 2 have 1000 records and 3 columns. One column is common between both the tables. I am doing an inner join on that one column. But the output I get is 10,000 records. Its sort of multilying the data and producing the output.

Following is the scenario:

Table1: (1000 Records)
Key Column2 Column3 Column4

Table2: (1000 Records)
Key Column5 column6

Output Table: (10,000 Records after the join)
Key Column2 Column3 Column4 Column5 Column6

Source is oracle database and target is oracle database. So three oracle enterprise stage 1 for target and 2 for source.

Please suggest. I just need the matching records from both the files at the output. Its actually giving me more than expected.

Posted: Tue Aug 07, 2007 1:56 pm
by swades
kashif007 wrote: I am doing an inner join on that one column
What is Data Type for this column?

Posted: Tue Aug 07, 2007 2:35 pm
by us1aslam1us
Here are some things you can check:

1. Check whether you are selecting the "Inner join" option correctly.
2. Any duplicates in the source or reference.
3. Any null values in the joining key.
4. Trim and sort the data before performing join on the key column.
5. Are you doing any partioning and re-partitioning on the data.

Posted: Tue Aug 07, 2007 2:50 pm
by ray.wurlod
Make very sure that the inputs are identically key-partitioned and sorted.

Re: Join Problem

Posted: Tue Aug 07, 2007 3:10 pm
by MTA
kashif007 wrote:I have joined two tables with an inner join. Table 1 has 5000 records and table 2 has 5000 records. The expected output cannot be more than 5000 records in this case. But I get way beyond 5000 records. It seems to me that the join is doing a cartesian product of both the tables. Can any one suggest me, how do I join two tables with out doing the cartesian product and getting the result equal to or less than 5000 records only. I don't understand why my join is working that way. Please suggest.
2 things kshif.
Are you using an enterprise stage to have a sql join the two tables. Then you need the node number condition.
If you are joinging them out side th EE stage then you would need to sort them and partition them in the appropriate way before join to get proper results of the join
If you are not using the EE stage please make sure that the stage is running in sequential mode and not parallel mode. If you are running a DB stage which is not EE stage and running in Parallel then you will get multiplication of rows extracted.

Please let know if this solves your issue