DSXchange

Posted: **Mon Oct 03, 2011 3:54 pm**

Lookup is distinct combination of key1 and key2. Now key1 and key2 both gets match from same reference dataset and same column.

key1 data has no duplicates but key2 has duplicates. now when look is performed every time getting different record count as output.
reference data has no duplicates.
how to get correct output

Posted: **Mon Oct 03, 2011 5:37 pm**

Can you give us an example? I'm not quite clear from your description.

Posted: **Mon Oct 03, 2011 8:05 pm**

example
input
key1 key2
A Q
B B
C B
D A

Lookup
A 1
B 2
C 3
D 4
Q 5

Expected output
1 5
2 2
3 2
4 1

Posted: **Mon Oct 03, 2011 9:08 pm**

Use copy stage in reference and take two output (pass 2 input to lookup) do lookup.

DS User

Posted: **Mon Oct 03, 2011 9:29 pm**

using copy stage in reference only . but still getting the same problem , every run with different output count.

Posted: **Mon Oct 03, 2011 10:00 pm**

harryhome wrote:using copy stage in reference only . but still getting the same problem , every run with different output count.

To add, I have hash partition, perform unique sort on input and two reference links

Posted: **Mon Oct 03, 2011 10:25 pm**

As per your sample data, if you use auto partition, you will not be in trouble.

DS User

Posted: **Mon Oct 03, 2011 11:31 pm**

Please confirm that your job design looks like this:

Code: Select all

               +-------+
               |       |
               |  Ref. |
               |       |
               +---+---+
                   |
                   V
               +-------+
               |       |
               |  Copy |
               |       |
               +-+---+-+
                 |   |
           ref1  |   |  ref2
                 V   V
               +-------+
               |       |
     ------>   |Lookup |  ------->
     stream    |       | 
               +-------+

Posted: **Tue Oct 04, 2011 11:49 am**

Yes Ray, Its exactly looks like that. one reference, one copy, two reference links and lookup.

Now in look up when I give

input as
key1 key2
A A
B A
C A
D H
E G
F A

I get different output rows.

I am doing hash partition on stream key column key2 sort

Posted: **Tue Oct 04, 2011 3:07 pm**

So, what output ARE you getting?

How are the columns mapped on the output of the Lookup stage?

DSXchange

lookup duplicates

lookup duplicates