Page 1 of 1

Duplicates to CDC stage

Posted: Wed Jul 05, 2006 11:47 am
by kris007
Hi All,
I am using Change Capture stage and the after dataset has duplicates I set the Change Mode =Explicit Keys,All Values as I want all the columns to be compared not just the key column. Both the inputs are sorted and has partitioned on the Key column but the problem is all the duplicate records are coming with the ChangeCode=3 i.e. inserts they should have been copies with the ChangeCode=0.I am sure others have faced this problem before can any one suggest me how to go about this problem other than remove duplicates from dataset option.

Re: Duplicates to CDC stage

Posted: Wed Jul 05, 2006 1:20 pm
by deployDS
I faced a similar problem when I did not sort the key columns in same order on both partitions.
Not the sort order but the precedence.

Posted: Wed Jul 05, 2006 2:11 pm
by kris007
Thanks dsdeploy.But I only have one key field here to sort, so precedence doesnt exist.I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage.Any other ideas what might be causing the problem.

Posted: Thu Jul 06, 2006 4:13 am
by keshav0307
check with data types, there may be white spaces in the key column values of either after records or before records.

Posted: Thu Jul 06, 2006 10:57 am
by kris007
keshav0307 wrote:check with data types, there may be white spaces in the key column values of either after records or before records.
The DataType of the key is Integer but still I tried to trim(StripWhiteSpace(inoldCust_Dim.CUST_KEY)). It didnot work.

Posted: Thu Jul 06, 2006 11:00 am
by kris007
keshav0307 wrote:check with data types, there may be white spaces in the key column values of either after records or before records.
The DataType of the key is Integer but still I tried to trim(StripWhiteSpace(inoldCust_Dim.CUST_KEY)). It didnot work, it still is passing duplicates as inserts.

Posted: Thu Jul 06, 2006 3:19 pm
by kris007
I am facing the same issue even using difference stage, the duplicates are sent in as insert records. and moreover how do I get the records from the after dataset when using difference stage?

Posted: Thu Jul 06, 2006 3:28 pm
by deployDS
kris007 wrote:I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage..
I didn't understand this part. What do you mean by partitioning again in CDC stage?

Posted: Thu Jul 06, 2006 4:21 pm
by kris007
deployDS wrote:
kris007 wrote:I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage..
I didn't understand this part. What do you mean by partitioning again in CDC stage?
Yes. Just in case..

Posted: Wed Nov 01, 2006 7:26 pm
by mab_arif16
kris007 wrote:
deployDS wrote:
kris007 wrote:I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage..
I didn't understand this part. What do you mean by partitioning again in CDC stage?
Yes. Just in case..
did this got resolved

thanks
arif

Posted: Wed Nov 01, 2006 11:42 pm
by ray.wurlod
Surely the Change Capture stage requires identical key partitioning (so that valid matches can be made) AND identical sorting (so that memory can be efficiently used). Try adding sorting on both input links, using the key column(s) as the one(s) to be sorted. That way, out of order rows will not be incorrectly described as inserts.

Posted: Thu Nov 02, 2006 8:48 am
by mab_arif16
ray.wurlod wrote:Surely the Change Capture stage requires identical key partitioning (so that valid matches can be made) AND identical sorting (so that memory can be efficiently used). Try adding sorting on both input links, using the key column(s) as the one(s) to be sorted. That way, out of order rows will not be incorrectly described as inserts.
Even if we partition and sort identically ,if there are duplicates keys in the after data set ,the CDC stage will mark the duplicate rows as inserts .
And if you have duplicates in the before data set ,the CDC stage calculates the change code by comparing only the first record of before set where key columns match.
I think the CDC stage works perfectly only when there are NO duplicates in the before and the after data set.

Please correct me if I am wrong
Thanks
Arif

Posted: Fri Nov 03, 2006 2:57 am
by mbsiddu
chk the propertis....and set...
CDC stage > properties > options >

drop output for copy = true
drop output for delete = false
drop output for insert = false
drop output for edit = false