Ignoring duplicate entry warning despite only 1 matching row

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Gazelle
Premium Member
Premium Member
Posts: 108
Joined: Mon Nov 24, 2003 11:36 pm
Location: Australia (Melbourne)

Ignoring duplicate entry warning despite only 1 matching row

Post by Gazelle »

I just learned another thing about DataStage, and thought it might help others:
the lookup stage seems to check for duplicates in the reference stream before trying to match them to the incoming data stream.

Symptoms:
The job log showed a warning message for the lookup stage:
lkupOrgHierKey,0: Ignoring duplicate entry at table record 57; no further warnings will be issued for this table

Verification:
1. After reducing the test data down to just 2 incoming rows, I checked the reference data and confirmed that there was only 1 matching row for each of the 2 source rows.
2. After allowing duplicates on the reference link for the lookup stage, there were still only 2 input rows and 2 output rows; if there really were duplicates, I'd expect more than 2 output rows.
3. Inserting a remove-duplicates stage on the reference link does remove some duplicate rows, but for rows that did not match the incoming data (well, they were a 1/4 match, as these rows had only 1 key matching, and the other 3 key values were null).

Can anyone confirm that the lookup stage will warn of duplicates even if they do not match any of the source data?
harishkumar.upadrasta
Participant
Posts: 18
Joined: Tue Dec 25, 2012 10:39 pm
Location: Detroit,MI

Post by harishkumar.upadrasta »

Hi,

A lookup stage will get the reference data set to memory before it stats matching the data. But while considering the data for lookup it will only select the first matching record from the reference dataset and drops all the other based on the Keys. Hence is the warning message "Ignoring duplicate entry at table record "XXXX"; no further warnings will be issued for this table.

If you want all the records from the reference to be matched then set the below property in the Lookup Constraints tab->Multiple rows returned from the link-> and select the refenrence link from which you need the duplicates to be returned.

Hope this answers your question.
Harish
Post Reply