Duplicates to CDC stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Duplicates to CDC stage

Post by kris007 »

Hi All,
I am using Change Capture stage and the after dataset has duplicates I set the Change Mode =Explicit Keys,All Values as I want all the columns to be compared not just the key column. Both the inputs are sorted and has partitioned on the Key column but the problem is all the duplicate records are coming with the ChangeCode=3 i.e. inserts they should have been copies with the ChangeCode=0.I am sure others have faced this problem before can any one suggest me how to go about this problem other than remove duplicates from dataset option.
Kris

Where's the "Any" key?-Homer Simpson
deployDS
Premium Member
Premium Member
Posts: 45
Joined: Thu Mar 09, 2006 9:36 am

Re: Duplicates to CDC stage

Post by deployDS »

I faced a similar problem when I did not sort the key columns in same order on both partitions.
Not the sort order but the precedence.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Thanks dsdeploy.But I only have one key field here to sort, so precedence doesnt exist.I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage.Any other ideas what might be causing the problem.
Kris

Where's the "Any" key?-Homer Simpson
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

check with data types, there may be white spaces in the key column values of either after records or before records.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

keshav0307 wrote:check with data types, there may be white spaces in the key column values of either after records or before records.
The DataType of the key is Integer but still I tried to trim(StripWhiteSpace(inoldCust_Dim.CUST_KEY)). It didnot work.
Kris

Where's the "Any" key?-Homer Simpson
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

keshav0307 wrote:check with data types, there may be white spaces in the key column values of either after records or before records.
The DataType of the key is Integer but still I tried to trim(StripWhiteSpace(inoldCust_Dim.CUST_KEY)). It didnot work, it still is passing duplicates as inserts.
Kris

Where's the "Any" key?-Homer Simpson
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

I am facing the same issue even using difference stage, the duplicates are sent in as insert records. and moreover how do I get the records from the after dataset when using difference stage?
Kris

Where's the "Any" key?-Homer Simpson
deployDS
Premium Member
Premium Member
Posts: 45
Joined: Thu Mar 09, 2006 9:36 am

Post by deployDS »

kris007 wrote:I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage..
I didn't understand this part. What do you mean by partitioning again in CDC stage?
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

deployDS wrote:
kris007 wrote:I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage..
I didn't understand this part. What do you mean by partitioning again in CDC stage?
Yes. Just in case..
Kris

Where's the "Any" key?-Homer Simpson
mab_arif16
Charter Member
Charter Member
Posts: 87
Joined: Sat Mar 18, 2006 11:45 pm

Post by mab_arif16 »

kris007 wrote:
deployDS wrote:
kris007 wrote:I am doing Hash partitioning by the same key field in both inputs and also performing hash partitioning again in the CDC stage..
I didn't understand this part. What do you mean by partitioning again in CDC stage?
Yes. Just in case..
did this got resolved

thanks
arif
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Surely the Change Capture stage requires identical key partitioning (so that valid matches can be made) AND identical sorting (so that memory can be efficiently used). Try adding sorting on both input links, using the key column(s) as the one(s) to be sorted. That way, out of order rows will not be incorrectly described as inserts.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mab_arif16
Charter Member
Charter Member
Posts: 87
Joined: Sat Mar 18, 2006 11:45 pm

Post by mab_arif16 »

ray.wurlod wrote:Surely the Change Capture stage requires identical key partitioning (so that valid matches can be made) AND identical sorting (so that memory can be efficiently used). Try adding sorting on both input links, using the key column(s) as the one(s) to be sorted. That way, out of order rows will not be incorrectly described as inserts.
Even if we partition and sort identically ,if there are duplicates keys in the after data set ,the CDC stage will mark the duplicate rows as inserts .
And if you have duplicates in the before data set ,the CDC stage calculates the change code by comparing only the first record of before set where key columns match.
I think the CDC stage works perfectly only when there are NO duplicates in the before and the after data set.

Please correct me if I am wrong
Thanks
Arif
mbsiddu
Participant
Posts: 7
Joined: Fri Mar 24, 2006 11:36 pm
Location: Delhi

Post by mbsiddu »

chk the propertis....and set...
CDC stage > properties > options >

drop output for copy = true
drop output for delete = false
drop output for insert = false
drop output for edit = false
Post Reply