Lookup Stage get more records than primary link records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jenny_wang
Participant
Posts: 26
Joined: Mon Nov 19, 2007 2:55 am
Location: Hangzhou

Lookup Stage get more records than primary link records

Post by jenny_wang »

the job use a lookup stage, the primary link is reading records from a records.ds. the reference link is a oracle stage, the records.ds have 22697 records and the oracle stage output 304 records,but the lookup stage output 68091 records.

could you please help me to find out why this happened ?
mahadev.v
Participant
Posts: 111
Joined: Tue May 06, 2008 5:29 am
Location: Bangalore

Post by mahadev.v »

Probably due to duplicates in the reference data. Remove the duplicates in the reference link on the lookup keys.
"given enough eyeballs, all bugs are shallow" - Eric S. Raymond
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Possibly also due to incorrect partitioning.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jenny_wang
Participant
Posts: 26
Joined: Mon Nov 19, 2007 2:55 am
Location: Hangzhou

Post by jenny_wang »

hi,Ray
which partition should be used? how can i know whether this is caused by wrong partition?

i found that there are duplicated entries in the oracle output, but there is no warnings in the log.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Hash partition by the lookup key on the stream input and either Hash or Entire partition by the lookup key on the reference input. Do check to see whether there are duplicates in your reference input - if you don't want them interpolate a Remove Duplicates stage on this link, or simply include a DISTINCT keyword in the Oracle query.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mahadev.v
Participant
Posts: 111
Joined: Tue May 06, 2008 5:29 am
Location: Bangalore

Post by mahadev.v »

Generally an entire partition is used on the reference link and the main link is left to auto. No warnings in the log (i am guessing) could be because your key fields are different from the lookup keys.
"given enough eyeballs, all bugs are shallow" - Eric S. Raymond
jenny_wang
Participant
Posts: 26
Joined: Mon Nov 19, 2007 2:55 am
Location: Hangzhou

Post by jenny_wang »

hi, mahadev
if there exist duplicated entries, warnings will show up in the log. i have a column named site_unit, the oracle output two columns named site_key and site_unit, if the two site unit are identical, the site_key is output or the site_key is null. I am not clear about what you said " because your key fields are different from the lookup keys. "
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No warnings are logged if your Lookup stage specifies "allow duplicates".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jenny_wang
Participant
Posts: 26
Joined: Mon Nov 19, 2007 2:55 am
Location: Hangzhou

Post by jenny_wang »

I remove the duplicated records and the job works well. thanks for help!
Post Reply