Problem with Joiner stage

shiva459 · Post by **shiva459** » Tue Feb 12, 2008 12:46 am

Hi All,

I have a scenario where is I am joining 2 datasets in a joiner.The dataset A.ds has 479 records and dataset B.ds has 383 records.I am doing inner join which should give 479 records in output.
I also have sort stage before join stage.I am sorting on the key which I also use to join.I tried these things:

1. I hash partitioned the inputs based on the same key on which I sort/join which geve me 518 records as output.

2. I removed partitioning and just sorted data before joining..which gave me 14 records as output.

3. I used lookup just for testing and it gave me 479 records which is proper.

I am not able to understand whats wrong here when i use join stage.If someone has some idea about this please let me know.

Thanks in advance
Shiva

ray.wurlod · Post by **ray.wurlod** » Tue Feb 12, 2008 1:35 am

Prove that there are no duplicates on either input to the Join stage. Hash partition AND sort the join keys.

shiva459 · Post by **shiva459** » Tue Feb 12, 2008 2:05 am

There are duplicates in one of the dataset.But they are valid records.

Thanks

Shiva

Maveric · Post by **Maveric** » Tue Feb 12, 2008 2:29 am

For any given record in the left link, if there are multiple matches (duplicates) in the right link, then there will be multiple records in the output. So the record counts wont match. Use a remove duplicates stage for the right link and then try.

shiva459 · Post by **shiva459** » Tue Feb 12, 2008 2:56 am

Yes I have used remove duplicate stage for the left link and removed all the duplicates and then used sort and join.But I have duplicates in the left link which is valid for me and I need to have them in target.

Maveric · Post by **Maveric** » Tue Feb 12, 2008 3:17 am

I said remove duplicates on the right link, and do a left outer join.

shiva459 · Post by **shiva459** » Tue Feb 12, 2008 3:54 am

Sorry..I did remove the duplicates on the right link and joined.But as u suggested if I do left outer join I will get all the records from left link which do not match with right link as well .I want only those records which match.

Maveric · Post by **Maveric** » Tue Feb 12, 2008 4:09 am

Sorry, my mistake. Not left outer but Inner join.

shiva459 · Post by **shiva459** » Tue Feb 12, 2008 4:18 am

No problem..I did as you asked me to do and also did inner join.Still the result is same.I am getting wrong result.

Maveric · Post by **Maveric** » Tue Feb 12, 2008 4:32 am

What do you mean wrong result? Record counts mismatch?

shiva459 · Post by **shiva459** » Tue Feb 12, 2008 4:39 am

I have mentioned in the 1st post about the recourd count that I am getting.Please have a look at them.You will know about the wrong result which I am getting.

Thanks

ray.wurlod · Post by **ray.wurlod** » Tue Feb 12, 2008 6:32 am

Remove duplicates on both inputs.

shiva459 · Post by **shiva459** » Tue Feb 12, 2008 6:42 am

The dataset in the left link has been created from MQ which has xml messages.So when I create it the dataset can have duplicates for the col which I use to join.As some of them will be upddated records from the xml message.I also have the logic for new and updated record in the later job.
So I need all the matching records from the left link though they are dups

Thanks

Shiva