Page 1 of 1

Duplicates while inserting in to target

Posted: Mon Dec 18, 2006 12:30 am
by 120267
Hi,

We are facing a peculiar problem in DS. We are using SCD2 job. From the source we are loading in to hash file and from the hash file we are inserting in to target. Its a row by row commit and before inserting we are having oracle stage lookup to generate surrogate key.

Our problem is some times the same records are inserted in to target with different surrogate keys. The insert date timestamp is also same for that records, the only difference is the surrogate key.

We are using link collectors, interprocess and hash file stages in the job. This scenario is happening once in a while. Will you please help me to resolve this issue?

Re: Duplicates while inserting in to target

Posted: Mon Dec 18, 2006 12:59 am
by ajith
Probably you can do one thing, split the job into two, one for writing the insert records to a dataset and one for loading it into a oracle target. Then schedule the second job after the first. That will reduce the database hits too .

Secondly, make sure that your datatypes match, there are no leading or trailing spaces in the columns used for looking up. More importantly, find the reason for look up failure.

Again, these are valid only if my understanding is correct, try to explain the job flow with a simplified ascii diagram so that everyone can understand the job flow

Posted: Mon Dec 18, 2006 7:11 am
by DSguru2B
How many keys do you have defined in your target table? SCD2 means you will have to update too, are you doing that, how exactly are you doing that?

Posted: Tue Dec 19, 2006 7:48 am
by DSguru2B
How can you update based on the surrogate key when your generating those as you insert? You need to update based on the natural key. Also, if the surrogate key is the only key column defined, how are you getting duplicates? How many columns have unique indices on them? I am confused :roll:

Posted: Wed Dec 20, 2006 12:00 am
by 120267
Hi,

My source will have left outer join with target to get the target surrogate key.If the CDC columns are changed for that Natural Key ..I am inserting one New record with New surrogate Key and Updating the old Record based on the Sk column.As i mentioned earlier it's a row by row commit and Grouped transaction.

Posted: Wed Dec 20, 2006 12:00 am
by 120267
Hi,

My source will have left outer join with target to get the target surrogate key.If the CDC columns are changed for that Natural Key ..I am inserting one New record with New surrogate Key and Updating the old Record based on the Sk column.As i mentioned earlier it's a row by row commit and Grouped transaction.

Posted: Wed Dec 20, 2006 1:52 am
by kumar_s
Check the logic on how you update the records. It might be a case that even update go in as Insert with new surrogate key.
You can give use more specific cases that on which case the duplicates occurs. Is it all the records from the source bee populated as duplicate?

Posted: Wed Dec 20, 2006 9:57 am
by kris007
You might end up having duplicates if you have not defined your key's properly. What are your keys?