Page 1 of 1

Join and Merge results different from Lookup

Posted: Fri Dec 14, 2012 9:28 am
by wahi80
Hi,

I have two files both containing around 3mn records. I wanted to check out which option would be fastest for joining the data. So I used same set of files in different jobs on Join, Merge and Lookup but am getting different matched records. The Join and Merge show same set of matched records, but lookup shows a totally different number.

On furthere analysis the Lookup output is correct. I'm trying to figure out what is wrong with Join and Merge.

I have already ensured that the keys are hash partitoned and sorted.

What else could I be missing?

Posted: Fri Dec 14, 2012 3:31 pm
by ray.wurlod
HOW (on what columns) is the partitioning done? What partitioning are you using for the Lookup stage? Have you ensured that inputs to Join and Merge are correctly sorted?

Posted: Fri Dec 14, 2012 11:22 pm
by Rakesh311
Join and Merge will also provide you all the matched output ,i.e for the duplicate(multiple) key values if present in ref file also.

Where as in look up it will give you only the first match!!!

Posted: Sat Dec 15, 2012 1:25 am
by ray.wurlod
... unless you specify "multiple row return" in the Lookup stage.

Posted: Mon Dec 17, 2012 12:08 pm
by wahi80
In Lookup the primary link is in Auto Mode, reference link is Entire

In Join and Merge both the links are partitioned and sorted on the join keys.

What else am I missing? I have done this a 100 times before, but this one is bugging me...

Posted: Mon Dec 17, 2012 2:30 pm
by jwiles
What type of Join (Inner, Left/Right Outer, Full Outer)? What constraints in the Lookup (Drop, Fail, Continue)? Which had more output records, the Join or the Lookup?

Regards,

Posted: Mon Dec 17, 2012 3:08 pm
by wahi80
Join - Inner Join
Lookup - On failure Reject as I ahd to capture the records

Lookup had more matching records