SCD stage not detecting a record already exists

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ggarze
Premium Member
Premium Member
Posts: 78
Joined: Tue Oct 11, 2005 9:37 am

SCD stage not detecting a record already exists

Post by ggarze »

Running version 9x on linux

starting last week a random occurence of jobs using SCD stage do not appear to be partitoning the data correctly. Some input records are not being detected as already present and the stage is determining it to be a new record and inserting it into the target dimension table, causing duplicate records on the natural keys.

Also, despite the stage throwing the warning "Ignoring duplicate key entry trying to be inserted; no further warnings will be issued for this table" it still inserts the record into the target dimension.

The SCD stage for both the source data set coming in and the database reference link are set to manually Hash sort on the same 3 keys.

Anyone have any idea what might be going on all of a sudden? Server has been rebooted and issue still occurs. Again, it's random. We can delete the duplicates on a table and rerun the job and it detects the record exists and treates the inputs as updates not inserts. We tried this NOT recompiling and recompiling and it works ok. Only seems to happen during normal automated runs when other jobs are running in the batch schedule.

Thanks,
Glenn
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What has changed since it was working properly? (Hint: "nothing" is not the correct answer.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ggarze
Premium Member
Premium Member
Posts: 78
Joined: Tue Oct 11, 2005 9:37 am

Post by ggarze »

Add 100 gigs of scratch space and added the environment variable APT_OLD_BOUNDED_LENGTH but it's set to False in the environment.
ggarze
Premium Member
Premium Member
Posts: 78
Joined: Tue Oct 11, 2005 9:37 am

Post by ggarze »

One of the difficulties in this is it's not consistent. You could delete the duplicates that got inserted and then rerun the ETL job and the job would detect that the record existed and just update.

Anyway we removed the APY_OLD_BOUNDED_LENGTH from the environment variable section and did NOT experience any job using the SCD stage last night throwing the warning and then inserting a duplicate record on the natural key.

A colleague found this article which seems to be a bug with this environment variable and the SCD stage working together. The article points to the variable being "turned on" which we would assume meant value set to "True", but we removed it altogether because even though the variable was set to 'False' in the environment it still looks like it caused an issue.

http://www-01.ibm.com/support/docview.w ... wg1JR45634

Hopefully this was the cause and we'll continue to monitor.
Post Reply