Page 1 of 1

lookup duplicates

Posted: Mon Oct 03, 2011 3:54 pm
by harryhome
Lookup is distinct combination of key1 and key2. Now key1 and key2 both gets match from same reference dataset and same column.

key1 data has no duplicates but key2 has duplicates. now when look is performed every time getting different record count as output.
reference data has no duplicates.
how to get correct output

Posted: Mon Oct 03, 2011 5:37 pm
by ray.wurlod
Can you give us an example? I'm not quite clear from your description.

Posted: Mon Oct 03, 2011 8:05 pm
by harryhome
example
input
key1 key2
A Q
B B
C B
D A

Lookup
A 1
B 2
C 3
D 4
Q 5

Expected output
1 5
2 2
3 2
4 1

Posted: Mon Oct 03, 2011 9:08 pm
by SURA
Use copy stage in reference and take two output (pass 2 input to lookup) do lookup.

DS User

Posted: Mon Oct 03, 2011 9:29 pm
by harryhome
using copy stage in reference only . but still getting the same problem , every run with different output count.

Posted: Mon Oct 03, 2011 10:00 pm
by harryhome
harryhome wrote:using copy stage in reference only . but still getting the same problem , every run with different output count.
To add, I have hash partition, perform unique sort on input and two reference links

Posted: Mon Oct 03, 2011 10:25 pm
by SURA
As per your sample data, if you use auto partition, you will not be in trouble.

DS User

Posted: Mon Oct 03, 2011 11:31 pm
by ray.wurlod
Please confirm that your job design looks like this:

Code: Select all

               +-------+
               |       |
               |  Ref. |
               |       |
               +---+---+
                   |
                   V
               +-------+
               |       |
               |  Copy |
               |       |
               +-+---+-+
                 |   |
           ref1  |   |  ref2
                 V   V
               +-------+
               |       |
     ------>   |Lookup |  ------->
     stream    |       | 
               +-------+

Posted: Tue Oct 04, 2011 11:49 am
by harryhome
Yes Ray, Its exactly looks like that. one reference, one copy, two reference links and lookup.

Now in look up when I give

input as
key1 key2
A A
B A
C A
D H
E G
F A


I get different output rows.

I am doing hash partition on stream key column key2 sort

Posted: Tue Oct 04, 2011 3:07 pm
by ray.wurlod
So, what output ARE you getting?

How are the columns mapped on the output of the Lookup stage?