Problem with Inner Join

nynali · Post by **nynali** » Tue Jun 19, 2007 12:29 am

Hi,
I am joining two datasets thro an inner join.
These two datasets are previously sorted and partitioned on 3 keys in another job.
In the INNER JOIN job I am doing some transformations to one of the dataset and then doing an inner join. I am now doing a join on 5 keys and the three keys(mentioned) are also part of the join. When I ran this job for the first time with 1 lakh records there was no problem but when I am doing it with 10 lakh records I am getting 2 extra records after the inner join.Please help me on this.Also I have mentioned the partitioning to be same in the inner join stage i.e, on three keys as it was done in previous job.

ray.wurlod · Post by **ray.wurlod** » Tue Jun 19, 2007 1:35 am

Make the partitioning specific. Don't rely on (Auto) to preserve previous partitioning - it is just as likely to repartition using round robin.