Problem with Joiner stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
shiva459
Premium Member
Premium Member
Posts: 100
Joined: Wed Jun 11, 2003 1:49 am
Location: Bangalore

Problem with Joiner stage

Post by shiva459 »

Hi All,

I have a scenario where is I am joining 2 datasets in a joiner.The dataset A.ds has 479 records and dataset B.ds has 383 records.I am doing inner join which should give 479 records in output.
I also have sort stage before join stage.I am sorting on the key which I also use to join.I tried these things:

1. I hash partitioned the inputs based on the same key on which I sort/join which geve me 518 records as output.

2. I removed partitioning and just sorted data before joining..which gave me 14 records as output.

3. I used lookup just for testing and it gave me 479 records which is proper.

I am not able to understand whats wrong here when i use join stage.If someone has some idea about this please let me know.

Thanks in advance
Shiva
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Prove that there are no duplicates on either input to the Join stage. Hash partition AND sort the join keys.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shiva459
Premium Member
Premium Member
Posts: 100
Joined: Wed Jun 11, 2003 1:49 am
Location: Bangalore

Post by shiva459 »

There are duplicates in one of the dataset.But they are valid records.

Thanks

Shiva
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

For any given record in the left link, if there are multiple matches (duplicates) in the right link, then there will be multiple records in the output. So the record counts wont match. Use a remove duplicates stage for the right link and then try.
shiva459
Premium Member
Premium Member
Posts: 100
Joined: Wed Jun 11, 2003 1:49 am
Location: Bangalore

Post by shiva459 »

Yes I have used remove duplicate stage for the left link and removed all the duplicates and then used sort and join.But I have duplicates in the left link which is valid for me and I need to have them in target.
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

I said remove duplicates on the right link, and do a left outer join.
shiva459
Premium Member
Premium Member
Posts: 100
Joined: Wed Jun 11, 2003 1:49 am
Location: Bangalore

Post by shiva459 »

Sorry..I did remove the duplicates on the right link and joined.But as u suggested if I do left outer join I will get all the records from left link which do not match with right link as well .I want only those records which match.
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

Sorry, my mistake. Not left outer but Inner join.
shiva459
Premium Member
Premium Member
Posts: 100
Joined: Wed Jun 11, 2003 1:49 am
Location: Bangalore

Post by shiva459 »

No problem..I did as you asked me to do and also did inner join.Still the result is same.I am getting wrong result.
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

What do you mean wrong result? Record counts mismatch?
shiva459
Premium Member
Premium Member
Posts: 100
Joined: Wed Jun 11, 2003 1:49 am
Location: Bangalore

Post by shiva459 »

I have mentioned in the 1st post about the recourd count that I am getting.Please have a look at them.You will know about the wrong result which I am getting.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Remove duplicates on both inputs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shiva459
Premium Member
Premium Member
Posts: 100
Joined: Wed Jun 11, 2003 1:49 am
Location: Bangalore

Post by shiva459 »

The dataset in the left link has been created from MQ which has xml messages.So when I create it the dataset can have duplicates for the col which I use to join.As some of them will be upddated records from the xml message.I also have the logic for new and updated record in the later job.
So I need all the matching records from the left link though they are dups

Thanks

Shiva
Post Reply