Page 1 of 1

Lookup Stage get more records than primary link records

Posted: Mon Jul 14, 2008 3:02 am
by jenny_wang
the job use a lookup stage, the primary link is reading records from a records.ds. the reference link is a oracle stage, the records.ds have 22697 records and the oracle stage output 304 records,but the lookup stage output 68091 records.

could you please help me to find out why this happened ?

Posted: Mon Jul 14, 2008 3:05 am
by mahadev.v
Probably due to duplicates in the reference data. Remove the duplicates in the reference link on the lookup keys.

Posted: Mon Jul 14, 2008 3:16 am
by ray.wurlod
Possibly also due to incorrect partitioning.

Posted: Mon Jul 14, 2008 3:26 am
by jenny_wang
hi,Ray
which partition should be used? how can i know whether this is caused by wrong partition?

i found that there are duplicated entries in the oracle output, but there is no warnings in the log.

Posted: Mon Jul 14, 2008 3:34 am
by ray.wurlod
Hash partition by the lookup key on the stream input and either Hash or Entire partition by the lookup key on the reference input. Do check to see whether there are duplicates in your reference input - if you don't want them interpolate a Remove Duplicates stage on this link, or simply include a DISTINCT keyword in the Oracle query.

Posted: Mon Jul 14, 2008 3:37 am
by mahadev.v
Generally an entire partition is used on the reference link and the main link is left to auto. No warnings in the log (i am guessing) could be because your key fields are different from the lookup keys.

Posted: Mon Jul 14, 2008 3:47 am
by jenny_wang
hi, mahadev
if there exist duplicated entries, warnings will show up in the log. i have a column named site_unit, the oracle output two columns named site_key and site_unit, if the two site unit are identical, the site_key is output or the site_key is null. I am not clear about what you said " because your key fields are different from the lookup keys. "

Posted: Mon Jul 14, 2008 4:19 am
by ray.wurlod
No warnings are logged if your Lookup stage specifies "allow duplicates".

Posted: Mon Jul 21, 2008 2:23 am
by jenny_wang
I remove the duplicated records and the job works well. thanks for help!