Lookup stage key multiple matches handling

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsguy
Participant
Posts: 34
Joined: Thu Mar 09, 2006 10:37 am

Lookup stage key multiple matches handling

Post by dsguy »

Hi

I have multiple key values coming from the reference table there by getting a warning . The reference table is from sybase and the source is a dataset.

Please let me know how to handle muliple matches in lookup stage.

Warning
-----------
Lkp1,0: Ignoring duplicate entry at table record 140; no further warnings will be issued for this table

Thanks,
Girish
Mahadev
Participant
Posts: 5
Joined: Tue Nov 21, 2006 5:13 am
Location: Bangalore

Unique sort.

Post by Mahadev »

Use a Sort Stage for the reference data, sort on the key fields and change "Allow duplicates" property to "False" or use a Remove Duplicates stage.

Hope this helps,
Mahadev V
dsguy
Participant
Posts: 34
Joined: Thu Mar 09, 2006 10:37 am

Post by dsguy »

Hi Mahadev

The jobs are already existing and we are not allowed to add or remove stages unless it is very much necessary. Is there any posibility of using Lookup satge to match first or last value in case of multiple matches of the key

Thanks in advance.

Regards,
Girish
Mahadev
Participant
Posts: 5
Joined: Tue Nov 21, 2006 5:13 am
Location: Bangalore

In Link Sort

Post by Mahadev »

Hi,
Use the Sort option in look up stage for the reference data and check the box for unique sort. So the duplicates will be removed. This will work fine for small number of reference records, but reduces the performance as the number increases.
Mahadev V
ajith
Participant
Posts: 86
Joined: Thu Nov 10, 2005 11:10 pm

Post by ajith »

dsguy wrote:Hi Mahadev

The jobs are already existing and we are not allowed to add or remove stages unless it is very much necessary. Is there any posibility of using Lookup satge to match first or last value in case of multiple matches of the key

Thanks in advance.

Regards,
Girish
There is an option to take all the values in case of a Multiple look up

"To specify that a link can legitimately return multiple rows:
Select the link name from the Multiple rows returned from link
drop-down list (note that only one reference link in a Lookup stage
is allowed to return multiple rows, and that this feature is only
available for in-memory lookups)."
dsguy
Participant
Posts: 34
Joined: Thu Mar 09, 2006 10:37 am

Post by dsguy »

Thanks a lot.

Regards,
Girish
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

ajith wrote: There is an option to take all the values in case of a Multiple look up

"To specify that a link can legitimately return multiple rows:
Select the link name from the Multiple rows returned from link
drop-down list (note that only one reference link in a Lookup stage
is allowed to return multiple rows, and that this feature is only
available for in-memory lookups)."
Warning, this will change the behavior of your job. This will duplicate your input data for each version of the lookup data found. Currently your job produces a warning and one output row for that duplicate lookup. If you change the property to accept multiple lookups it will create one output row for each duplicate lookup found, thus duplicating your input data. This is a dangerous property to set unless you want duplicate primary data.

You could analyze the reference data to find out why there are duplicates and clean it up or add an extra column to the join criteria. The point of linking two tables is to have a robust link between them with referential integrity. You might also be able to replace the reference table with a view that removes duplicates thus requiring very little change to your DataStage job.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What Vincent suggested can be accomplished by removing duplicates from the reference Data Set.

Of course it is sometimes legitimate to return multiple values (and thereby to generate more output rows than you have stream input rows), for example when performing a range lookup or a lookup against a secondary key.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply