SCD stage not detecting a record already exists

ggarze · Post by **ggarze** » Mon Jun 08, 2015 10:17 am

Running version 9x on linux

starting last week a random occurence of jobs using SCD stage do not appear to be partitoning the data correctly. Some input records are not being detected as already present and the stage is determining it to be a new record and inserting it into the target dimension table, causing duplicate records on the natural keys.

Also, despite the stage throwing the warning "Ignoring duplicate key entry trying to be inserted; no further warnings will be issued for this table" it still inserts the record into the target dimension.

The SCD stage for both the source data set coming in and the database reference link are set to manually Hash sort on the same 3 keys.

Anyone have any idea what might be going on all of a sudden? Server has been rebooted and issue still occurs. Again, it's random. We can delete the duplicates on a table and rerun the job and it detects the record exists and treates the inputs as updates not inserts. We tried this NOT recompiling and recompiling and it works ok. Only seems to happen during normal automated runs when other jobs are running in the batch schedule.

Thanks,
Glenn

ray.wurlod · Post by **ray.wurlod** » Mon Jun 08, 2015 6:07 pm

What has changed since it was working properly? (Hint: "nothing" is not the correct answer.)

ggarze · Post by **ggarze** » Mon Jun 08, 2015 7:27 pm

Add 100 gigs of scratch space and added the environment variable APT_OLD_BOUNDED_LENGTH but it's set to False in the environment.

ggarze · Post by **ggarze** » Tue Jun 09, 2015 11:27 am

One of the difficulties in this is it's not consistent. You could delete the duplicates that got inserted and then rerun the ETL job and the job would detect that the record existed and just update.

Anyway we removed the APY_OLD_BOUNDED_LENGTH from the environment variable section and did NOT experience any job using the SCD stage last night throwing the warning and then inserting a duplicate record on the natural key.

A colleague found this article which seems to be a bug with this environment variable and the SCD stage working together. The article points to the variable being "turned on" which we would assume meant value set to "True", but we removed it altogether because even though the variable was set to 'False' in the environment it still looks like it caused an issue.

http://www-01.ibm.com/support/docview.w ... wg1JR45634

Hopefully this was the cause and we'll continue to monitor.