Duplicates to CDC stage
Moderators: chulett, rschirm, roy
Duplicates to CDC stage
Hi All,
I am using Change Capture stage and the after dataset has duplicates I set the Change Mode =Explicit Keys,All Values as I want all the columns to be compared not just the key column. Both the inputs are sorted and has partitioned on the Key column but the problem is all the duplicate records are coming with the ChangeCode=3 i.e. inserts they should have been copies with the ChangeCode=0.I am sure others have faced this problem before can any one suggest me how to go about this problem other than remove duplicates from dataset option.
I am using Change Capture stage and the after dataset has duplicates I set the Change Mode =Explicit Keys,All Values as I want all the columns to be compared not just the key column. Both the inputs are sorted and has partitioned on the Key column but the problem is all the duplicate records are coming with the ChangeCode=3 i.e. inserts they should have been copies with the ChangeCode=0.I am sure others have faced this problem before can any one suggest me how to go about this problem other than remove duplicates from dataset option.
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
Re: Duplicates to CDC stage
I faced a similar problem when I did not sort the key columns in same order on both partitions.
Not the sort order but the precedence.
Not the sort order but the precedence.
Thanks dsdeploy.But I only have one key field here to sort, so precedence doesnt exist.I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage.Any other ideas what might be causing the problem.
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
The DataType of the key is Integer but still I tried to trim(StripWhiteSpace(inoldCust_Dim.CUST_KEY)). It didnot work, it still is passing duplicates as inserts.keshav0307 wrote:check with data types, there may be white spaces in the key column values of either after records or before records.
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
Yes. Just in case..deployDS wrote:I didn't understand this part. What do you mean by partitioning again in CDC stage?kris007 wrote:I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage..
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
-
- Charter Member
- Posts: 87
- Joined: Sat Mar 18, 2006 11:45 pm
did this got resolvedkris007 wrote:Yes. Just in case..deployDS wrote:I didn't understand this part. What do you mean by partitioning again in CDC stage?kris007 wrote:I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage..
thanks
arif
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Surely the Change Capture stage requires identical key partitioning (so that valid matches can be made) AND identical sorting (so that memory can be efficiently used). Try adding sorting on both input links, using the key column(s) as the one(s) to be sorted. That way, out of order rows will not be incorrectly described as inserts.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Charter Member
- Posts: 87
- Joined: Sat Mar 18, 2006 11:45 pm
Even if we partition and sort identically ,if there are duplicates keys in the after data set ,the CDC stage will mark the duplicate rows as inserts .ray.wurlod wrote:Surely the Change Capture stage requires identical key partitioning (so that valid matches can be made) AND identical sorting (so that memory can be efficiently used). Try adding sorting on both input links, using the key column(s) as the one(s) to be sorted. That way, out of order rows will not be incorrectly described as inserts.
And if you have duplicates in the before data set ,the CDC stage calculates the change code by comparing only the first record of before set where key columns match.
I think the CDC stage works perfectly only when there are NO duplicates in the before and the after data set.
Please correct me if I am wrong
Thanks
Arif