Page 1 of 1

Lookup stage key multiple matches handling

Posted: Thu Dec 14, 2006 3:34 am
by dsguy
Hi

I have multiple key values coming from the reference table there by getting a warning . The reference table is from sybase and the source is a dataset.

Please let me know how to handle muliple matches in lookup stage.

Warning
-----------
Lkp1,0: Ignoring duplicate entry at table record 140; no further warnings will be issued for this table

Thanks,
Girish

Unique sort.

Posted: Thu Dec 14, 2006 4:13 am
by Mahadev
Use a Sort Stage for the reference data, sort on the key fields and change "Allow duplicates" property to "False" or use a Remove Duplicates stage.

Hope this helps,

Posted: Thu Dec 14, 2006 4:53 am
by dsguy
Hi Mahadev

The jobs are already existing and we are not allowed to add or remove stages unless it is very much necessary. Is there any posibility of using Lookup satge to match first or last value in case of multiple matches of the key

Thanks in advance.

Regards,
Girish

In Link Sort

Posted: Thu Dec 14, 2006 5:29 am
by Mahadev
Hi,
Use the Sort option in look up stage for the reference data and check the box for unique sort. So the duplicates will be removed. This will work fine for small number of reference records, but reduces the performance as the number increases.

Posted: Thu Dec 14, 2006 5:52 am
by ajith
dsguy wrote:Hi Mahadev

The jobs are already existing and we are not allowed to add or remove stages unless it is very much necessary. Is there any posibility of using Lookup satge to match first or last value in case of multiple matches of the key

Thanks in advance.

Regards,
Girish
There is an option to take all the values in case of a Multiple look up

"To specify that a link can legitimately return multiple rows:
Select the link name from the Multiple rows returned from link
drop-down list (note that only one reference link in a Lookup stage
is allowed to return multiple rows, and that this feature is only
available for in-memory lookups)."

Posted: Thu Dec 14, 2006 6:19 am
by dsguy
Thanks a lot.

Regards,
Girish

Posted: Thu Dec 14, 2006 11:35 am
by vmcburney
ajith wrote: There is an option to take all the values in case of a Multiple look up

"To specify that a link can legitimately return multiple rows:
Select the link name from the Multiple rows returned from link
drop-down list (note that only one reference link in a Lookup stage
is allowed to return multiple rows, and that this feature is only
available for in-memory lookups)."
Warning, this will change the behavior of your job. This will duplicate your input data for each version of the lookup data found. Currently your job produces a warning and one output row for that duplicate lookup. If you change the property to accept multiple lookups it will create one output row for each duplicate lookup found, thus duplicating your input data. This is a dangerous property to set unless you want duplicate primary data.

You could analyze the reference data to find out why there are duplicates and clean it up or add an extra column to the join criteria. The point of linking two tables is to have a robust link between them with referential integrity. You might also be able to replace the reference table with a view that removes duplicates thus requiring very little change to your DataStage job.

Posted: Thu Dec 14, 2006 2:43 pm
by ray.wurlod
What Vincent suggested can be accomplished by removing duplicates from the reference Data Set.

Of course it is sometimes legitimate to return multiple values (and thereby to generate more output rows than you have stream input rows), for example when performing a range lookup or a lookup against a secondary key.