Page 1 of 1

Problem with Joiner stage

Posted: Tue Feb 12, 2008 12:46 am
by shiva459
Hi All,

I have a scenario where is I am joining 2 datasets in a joiner.The dataset A.ds has 479 records and dataset B.ds has 383 records.I am doing inner join which should give 479 records in output.
I also have sort stage before join stage.I am sorting on the key which I also use to join.I tried these things:

1. I hash partitioned the inputs based on the same key on which I sort/join which geve me 518 records as output.

2. I removed partitioning and just sorted data before joining..which gave me 14 records as output.

3. I used lookup just for testing and it gave me 479 records which is proper.

I am not able to understand whats wrong here when i use join stage.If someone has some idea about this please let me know.

Thanks in advance
Shiva

Posted: Tue Feb 12, 2008 1:35 am
by ray.wurlod
Prove that there are no duplicates on either input to the Join stage. Hash partition AND sort the join keys.

Posted: Tue Feb 12, 2008 2:05 am
by shiva459
There are duplicates in one of the dataset.But they are valid records.

Thanks

Shiva

Posted: Tue Feb 12, 2008 2:29 am
by Maveric
For any given record in the left link, if there are multiple matches (duplicates) in the right link, then there will be multiple records in the output. So the record counts wont match. Use a remove duplicates stage for the right link and then try.

Posted: Tue Feb 12, 2008 2:56 am
by shiva459
Yes I have used remove duplicate stage for the left link and removed all the duplicates and then used sort and join.But I have duplicates in the left link which is valid for me and I need to have them in target.

Posted: Tue Feb 12, 2008 3:17 am
by Maveric
I said remove duplicates on the right link, and do a left outer join.

Posted: Tue Feb 12, 2008 3:54 am
by shiva459
Sorry..I did remove the duplicates on the right link and joined.But as u suggested if I do left outer join I will get all the records from left link which do not match with right link as well .I want only those records which match.

Posted: Tue Feb 12, 2008 4:09 am
by Maveric
Sorry, my mistake. Not left outer but Inner join.

Posted: Tue Feb 12, 2008 4:18 am
by shiva459
No problem..I did as you asked me to do and also did inner join.Still the result is same.I am getting wrong result.

Posted: Tue Feb 12, 2008 4:32 am
by Maveric
What do you mean wrong result? Record counts mismatch?

Posted: Tue Feb 12, 2008 4:39 am
by shiva459
I have mentioned in the 1st post about the recourd count that I am getting.Please have a look at them.You will know about the wrong result which I am getting.

Thanks

Posted: Tue Feb 12, 2008 6:32 am
by ray.wurlod
Remove duplicates on both inputs.

Posted: Tue Feb 12, 2008 6:42 am
by shiva459
The dataset in the left link has been created from MQ which has xml messages.So when I create it the dataset can have duplicates for the col which I use to join.As some of them will be upddated records from the xml message.I also have the logic for new and updated record in the later job.
So I need all the matching records from the left link though they are dups

Thanks

Shiva