Query Regarding Change Capture stage malfunction

parag.s.27 · Post by **parag.s.27** » Wed Jan 09, 2008 4:26 am

I am having following scenario: -

I am doing a change capture between the source table and the target table. In source table I am taking the incremental load.

currently the data present in both source and target is: -

SOURCE: -

Code: Select all

RefNO.    ApplicantNo.   ApplicantID  Val1   Val2
111         1               1         Y        N
111         1               1         NULL   NULL
111         2               2         Y        N

TARGET: -

Code: Select all

RefNO.    ApplicantNo.   ApplicantID  Val1   Val2
111         1               1         Y        N
111         2               2         Y        N

Here RefNo., ApplicantNo.and ApplicantID are the keys on which I am doing the Change Capture. So the record in source having Val1 = NULL and Val2 = NULL should be coming as an update out of Change Capture stage, while the remaining 2 should be exact copy.

But actually the 1st record of source is also coming out of CC output and that too as an Insert with change code = 1. This should not be the case as it is exact copy. Now I removed the record with Val1 = NULL and Val2 = NULL from the source. So now my Source Vs Target is: -

SOURCE: -

Code: Select all

RefNO.    ApplicantNo.   ApplicantID  Val1   Val2
111         1               1         Y        N
111         2               2         Y        N

TARGET: -

Code: Select all

RefNO.    ApplicantNo.   ApplicantID  Val1   Val2
111         1               1         Y        N
111         2               2         Y        N

Now this time nothing came out of CC stage and it shown both of the source records as exact copy. So is it like when multiple records of same key combination comes in the source, The CC stage can not recognise all as copy or update.

Can some one help in this regard.

samsuf2002 · Post by **samsuf2002** » Wed Jan 09, 2008 9:47 am

What are the properties you are using in CDC ? i think the change value property should be 'Explicit keys and All values'.

crouse · Post by **crouse** » Wed Jan 09, 2008 10:24 am

Been there and observed the same behavior
See my post "viewtopic.php?t=111348"

You can't count on using the CDC stage when the key appears multiple times in the source (before) link.

Bummer, huh? You need to do a join, then some fancy footwork if you plan on loading them to a Type II SCD and have the latest occurance become the current row and the others history.

-Craig

Minhajuddin · Post by **Minhajuddin** » Wed Jan 09, 2008 2:35 pm

This is what the documentation says:

The stage assumes that the incoming data is key-partitioned and sorted in ascending order. The columns the data is hashed on should be the key columns used for the data compare. You can achieve the sorting and partitioning using the Sort stage or by using the built-in sorting and partitioning abilities of the Change Capture stage.

From my tests I found out that the input data must *not* have any duplicates. But, I wonder why the documentation doesn't specify that explicitly.

parag.s.27 · Post by **parag.s.27** » Wed Jan 09, 2008 11:41 pm

Minhajuddin wrote:This is what the documentation says:

The stage assumes that the incoming data is key-partitioned and sorted in ascending order. The columns the data is hashed on should be the key columns used for the data compare. You can achieve the sorting and partitioning using the Sort stage or by using the built-in sorting and partitioning abilities of the Change Capture stage.

From my tests I found out that the input data must *not* have any duplicates. But, I wonder why the documentation doesn't specify that explicitly.

If I clear the partition or if i select the partition type as "ENTIRE", then is it going to compare the incoming after stage duplicate key data to before stage in single partition. i.e. each time a duplicate key comes in, it will search the entire before stage.

Minhajuddin · Post by **Minhajuddin** » Thu Jan 10, 2008 2:17 pm

parag.s.27 wrote: If I clear the partition or if i select the partition type as "ENTIRE", then is it going to compare the incoming after stage duplicate key data to before stage in single partition. i.e. each time a duplicate key comes in, it will search the entire before stage.

I am sorry, I don't understand what you are saying.

You need to remove the duplicates on the key columns(the same key columns from CDC stage)before you send your data to the Change capture stage. Period. Because duplicates "confuse" the CDC stage.