Page 1 of 1

Lookup stage--Duplicate

Posted: Thu Feb 10, 2011 10:14 pm
by Pk39565
I have a job designed as below.

Sequential File is making a look up on table and wrting to another file.

The column which is used to make a look up is defined as decimal and in table it is varchar.

so I converted varchar to decimal by using transformer stage and then made a look up.
The issue is table has some character data and they are unique .look up stage is throwing an error .

Warning Message:
Ignoring duplicate entry at table record 1; no further warnings will be issued for this table

What I have understood is Look up stage is not identifying character data uniquly .How can this be fixed.

thanks

Posted: Thu Feb 10, 2011 10:36 pm
by jwiles
You cannot convert non-numeric strings to decimal...they will default to 0. This is the cause of your "duplicate" key warning messages.

The best (or easiest) solution:

Remove the table records with non-numeric strings. Your incoming records won't match to them anyway so why have them in the table? Use the isvalid() function with a constraint in a transformer to weed them out before the lookup stage.

Or perhaps have the SQL query in your database stage drop them instead.

Regards,

Re: Lookup stage--Duplicate

Posted: Fri Feb 11, 2011 2:59 am
by synsog
Issue is you have multiple records in the reference table having the same vaule in the column with which you are performing lookup.
handle that before you perform lookup

Re: Lookup stage--Duplicate

Posted: Thu Feb 17, 2011 11:31 pm
by sridinesh2009
in lookup stage-> constraints-> select the column in multiple rows returned drop down list. this will solve ur problem.

Posted: Thu Feb 17, 2011 11:42 pm
by nani0907
we can also use remove duplicate stage for refernce link to avoid the duplicates

Posted: Fri Feb 18, 2011 1:21 am
by jwiles
Remove Duplicate stage will require that the reference link data be partitioned and sorted, which is typically what you are trying to avoid doing by using the Lookup stage in the first place. Using RD will add unnecessary complexity and overhead to the job, which is a wasteful and poor design practice. The suggestion of using the constraint, while it may work, does not address the actual cause of the duplicates.

The true cause of the problem is that the reference data source table contains non-numeric data in the key column, which the OP is trying to convert to the same datatype as the incoming data stream (a decimal datatype). Every non-numeric value converts to the same default value, which causes the duplicates. The proper method to solve this is to remove the invalid non-numeric data before conversion to decimal by either using isvalid() or isnum() and a constraint in a transformer or by removing them in the SQL used to extract the table data in the first place.

Regards,