Duplicates to CDC stage

kris007 · Post by **kris007** » Wed Jul 05, 2006 11:47 am

Hi All,
I am using Change Capture stage and the after dataset has duplicates I set the Change Mode =Explicit Keys,All Values as I want all the columns to be compared not just the key column. Both the inputs are sorted and has partitioned on the Key column but the problem is all the duplicate records are coming with the ChangeCode=3 i.e. inserts they should have been copies with the ChangeCode=0.I am sure others have faced this problem before can any one suggest me how to go about this problem other than remove duplicates from dataset option.

deployDS · Post by **deployDS** » Wed Jul 05, 2006 1:20 pm

I faced a similar problem when I did not sort the key columns in same order on both partitions.
Not the sort order but the precedence.

kris007 · Post by **kris007** » Wed Jul 05, 2006 2:11 pm

Thanks dsdeploy.But I only have one key field here to sort, so precedence doesnt exist.I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage.Any other ideas what might be causing the problem.

keshav0307 · Post by **keshav0307** » Thu Jul 06, 2006 4:13 am

check with data types, there may be white spaces in the key column values of either after records or before records.

kris007 · Post by **kris007** » Thu Jul 06, 2006 10:57 am

keshav0307 wrote:check with data types, there may be white spaces in the key column values of either after records or before records.

The DataType of the key is Integer but still I tried to trim(StripWhiteSpace(inoldCust_Dim.CUST_KEY)). It didnot work.

kris007 · Post by **kris007** » Thu Jul 06, 2006 11:00 am

keshav0307 wrote:check with data types, there may be white spaces in the key column values of either after records or before records.

The DataType of the key is Integer but still I tried to trim(StripWhiteSpace(inoldCust_Dim.CUST_KEY)). It didnot work, it still is passing duplicates as inserts.

kris007 · Post by **kris007** » Thu Jul 06, 2006 3:19 pm

I am facing the same issue even using difference stage, the duplicates are sent in as insert records. and moreover how do I get the records from the after dataset when using difference stage?

deployDS · Post by **deployDS** » Thu Jul 06, 2006 3:28 pm

kris007 wrote:I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage..

I didn't understand this part. What do you mean by partitioning again in CDC stage?

kris007 · Post by **kris007** » Thu Jul 06, 2006 4:21 pm

deployDS wrote:
kris007 wrote:I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage..
I didn't understand this part. What do you mean by partitioning again in CDC stage?

Yes. Just in case..

mab_arif16 · Post by **mab_arif16** » Wed Nov 01, 2006 7:26 pm

kris007 wrote:
deployDS wrote:
kris007 wrote:I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage..
I didn't understand this part. What do you mean by partitioning again in CDC stage?
Yes. Just in case..

did this got resolved

thanks
arif

ray.wurlod · Post by **ray.wurlod** » Wed Nov 01, 2006 11:42 pm

Surely the Change Capture stage requires identical key partitioning (so that valid matches can be made) AND identical sorting (so that memory can be efficiently used). Try adding sorting on both input links, using the key column(s) as the one(s) to be sorted. That way, out of order rows will not be incorrectly described as inserts.

mab_arif16 · Post by **mab_arif16** » Thu Nov 02, 2006 8:48 am

ray.wurlod wrote:Surely the Change Capture stage requires identical key partitioning (so that valid matches can be made) AND identical sorting (so that memory can be efficiently used). Try adding sorting on both input links, using the key column(s) as the one(s) to be sorted. That way, out of order rows will not be incorrectly described as inserts.

Even if we partition and sort identically ,if there are duplicates keys in the after data set ,the CDC stage will mark the duplicate rows as inserts .
And if you have duplicates in the before data set ,the CDC stage calculates the change code by comparing only the first record of before set where key columns match.
I think the CDC stage works perfectly only when there are NO duplicates in the before and the after data set.

Please correct me if I am wrong
Thanks
Arif

mbsiddu · Post by **mbsiddu** » Fri Nov 03, 2006 2:57 am

chk the propertis....and set...
CDC stage > properties > options >

drop output for copy = true
drop output for delete = false
drop output for insert = false
drop output for edit = false

DSXchange

Duplicates to CDC stage

Duplicates to CDC stage

Re: Duplicates to CDC stage