Ignoring duplicate entry warning despite only 1 matching row
Posted: Sun Jan 20, 2013 7:44 pm
I just learned another thing about DataStage, and thought it might help others:
the lookup stage seems to check for duplicates in the reference stream before trying to match them to the incoming data stream.
Symptoms:
The job log showed a warning message for the lookup stage:
lkupOrgHierKey,0: Ignoring duplicate entry at table record 57; no further warnings will be issued for this table
Verification:
1. After reducing the test data down to just 2 incoming rows, I checked the reference data and confirmed that there was only 1 matching row for each of the 2 source rows.
2. After allowing duplicates on the reference link for the lookup stage, there were still only 2 input rows and 2 output rows; if there really were duplicates, I'd expect more than 2 output rows.
3. Inserting a remove-duplicates stage on the reference link does remove some duplicate rows, but for rows that did not match the incoming data (well, they were a 1/4 match, as these rows had only 1 key matching, and the other 3 key values were null).
Can anyone confirm that the lookup stage will warn of duplicates even if they do not match any of the source data?
the lookup stage seems to check for duplicates in the reference stream before trying to match them to the incoming data stream.
Symptoms:
The job log showed a warning message for the lookup stage:
lkupOrgHierKey,0: Ignoring duplicate entry at table record 57; no further warnings will be issued for this table
Verification:
1. After reducing the test data down to just 2 incoming rows, I checked the reference data and confirmed that there was only 1 matching row for each of the 2 source rows.
2. After allowing duplicates on the reference link for the lookup stage, there were still only 2 input rows and 2 output rows; if there really were duplicates, I'd expect more than 2 output rows.
3. Inserting a remove-duplicates stage on the reference link does remove some duplicate rows, but for rows that did not match the incoming data (well, they were a 1/4 match, as these rows had only 1 key matching, and the other 3 key values were null).
Can anyone confirm that the lookup stage will warn of duplicates even if they do not match any of the source data?