Query Regarding Change Capture stage malfunction

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
parag.s.27
Participant
Posts: 221
Joined: Fri Feb 17, 2006 3:38 am
Location: India
Contact:

Query Regarding Change Capture stage malfunction

Post by parag.s.27 »

I am having following scenario: -

I am doing a change capture between the source table and the target table. In source table I am taking the incremental load.

currently the data present in both source and target is: -

SOURCE: -

Code: Select all

RefNO.    ApplicantNo.   ApplicantID  Val1   Val2
111         1               1         Y        N
111         1               1         NULL   NULL
111         2               2         Y        N

TARGET: -

Code: Select all

RefNO.    ApplicantNo.   ApplicantID  Val1   Val2
111         1               1         Y        N
111         2               2         Y        N
Here RefNo., ApplicantNo.and ApplicantID are the keys on which I am doing the Change Capture. So the record in source having Val1 = NULL and Val2 = NULL should be coming as an update out of Change Capture stage, while the remaining 2 should be exact copy.

But actually the 1st record of source is also coming out of CC output and that too as an Insert with change code = 1. This should not be the case as it is exact copy. Now I removed the record with Val1 = NULL and Val2 = NULL from the source. So now my Source Vs Target is: -

SOURCE: -

Code: Select all

RefNO.    ApplicantNo.   ApplicantID  Val1   Val2
111         1               1         Y        N
111         2               2         Y        N

TARGET: -

Code: Select all

RefNO.    ApplicantNo.   ApplicantID  Val1   Val2
111         1               1         Y        N
111         2               2         Y        N
Now this time nothing came out of CC stage and it shown both of the source records as exact copy. So is it like when multiple records of same key combination comes in the source, The CC stage can not recognise all as copy or update.

Can some one help in this regard.
Thanks & Regards
Parag Saundattikar
Certified for Infosphere DataStage v8.0
samsuf2002
Premium Member
Premium Member
Posts: 397
Joined: Wed Apr 12, 2006 2:28 pm
Location: Tennesse

Post by samsuf2002 »

What are the properties you are using in CDC ? i think the change value property should be 'Explicit keys and All values'.
hi sam here
crouse
Charter Member
Charter Member
Posts: 204
Joined: Sun Oct 05, 2003 12:59 pm
Contact:

Post by crouse »

Been there and observed the same behavior
See my post "viewtopic.php?t=111348"

You can't count on using the CDC stage when the key appears multiple times in the source (before) link.

Bummer, huh? You need to do a join, then some fancy footwork if you plan on loading them to a Type II SCD and have the latest occurance become the current row and the others history.

-Craig
Craig Rouse
Griffin Resouces, Inc
www.griffinresources.com
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

This is what the documentation says:
The stage assumes that the incoming data is key-partitioned and sorted in ascending order. The columns the data is hashed on should be the key columns used for the data compare. You can achieve the sorting and partitioning using the Sort stage or by using the built-in sorting and partitioning abilities of the Change Capture stage.

From my tests I found out that the input data must *not* have any duplicates. But, I wonder why the documentation doesn't specify that explicitly.
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
parag.s.27
Participant
Posts: 221
Joined: Fri Feb 17, 2006 3:38 am
Location: India
Contact:

Post by parag.s.27 »

Minhajuddin wrote:This is what the documentation says:
The stage assumes that the incoming data is key-partitioned and sorted in ascending order. The columns the data is hashed on should be the key columns used for the data compare. You can achieve the sorting and partitioning using the Sort stage or by using the built-in sorting and partitioning abilities of the Change Capture stage.

From my tests I found out that the input data must *not* have any duplicates. But, I wonder why the documentation doesn't specify that explicitly.
If I clear the partition or if i select the partition type as "ENTIRE", then is it going to compare the incoming after stage duplicate key data to before stage in single partition. i.e. each time a duplicate key comes in, it will search the entire before stage.
Thanks & Regards
Parag Saundattikar
Certified for Infosphere DataStage v8.0
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

parag.s.27 wrote: If I clear the partition or if i select the partition type as "ENTIRE", then is it going to compare the incoming after stage duplicate key data to before stage in single partition. i.e. each time a duplicate key comes in, it will search the entire before stage.
I am sorry, I don't understand what you are saying.

You need to remove the duplicates on the key columns(the same key columns from CDC stage)before you send your data to the Change capture stage. Period. Because duplicates "confuse" the CDC stage.
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
Post Reply