lookup duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
harryhome
Participant
Posts: 112
Joined: Wed Oct 18, 2006 7:10 am

lookup duplicates

Post by harryhome »

Lookup is distinct combination of key1 and key2. Now key1 and key2 both gets match from same reference dataset and same column.

key1 data has no duplicates but key2 has duplicates. now when look is performed every time getting different record count as output.
reference data has no duplicates.
how to get correct output
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can you give us an example? I'm not quite clear from your description.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
harryhome
Participant
Posts: 112
Joined: Wed Oct 18, 2006 7:10 am

Post by harryhome »

example
input
key1 key2
A Q
B B
C B
D A

Lookup
A 1
B 2
C 3
D 4
Q 5

Expected output
1 5
2 2
3 2
4 1
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

Use copy stage in reference and take two output (pass 2 input to lookup) do lookup.

DS User
harryhome
Participant
Posts: 112
Joined: Wed Oct 18, 2006 7:10 am

Post by harryhome »

using copy stage in reference only . but still getting the same problem , every run with different output count.
harryhome
Participant
Posts: 112
Joined: Wed Oct 18, 2006 7:10 am

Post by harryhome »

harryhome wrote:using copy stage in reference only . but still getting the same problem , every run with different output count.
To add, I have hash partition, perform unique sort on input and two reference links
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

As per your sample data, if you use auto partition, you will not be in trouble.

DS User
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please confirm that your job design looks like this:

Code: Select all

               +-------+
               |       |
               |  Ref. |
               |       |
               +---+---+
                   |
                   V
               +-------+
               |       |
               |  Copy |
               |       |
               +-+---+-+
                 |   |
           ref1  |   |  ref2
                 V   V
               +-------+
               |       |
     ------>   |Lookup |  ------->
     stream    |       | 
               +-------+
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
harryhome
Participant
Posts: 112
Joined: Wed Oct 18, 2006 7:10 am

Post by harryhome »

Yes Ray, Its exactly looks like that. one reference, one copy, two reference links and lookup.

Now in look up when I give

input as
key1 key2
A A
B A
C A
D H
E G
F A


I get different output rows.

I am doing hash partition on stream key column key2 sort
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

So, what output ARE you getting?

How are the columns mapped on the output of the Lookup stage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply