Change Detect Capture(CDC) Stage overview

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Amin
Premium Member
Premium Member
Posts: 27
Joined: Fri Oct 24, 2014 10:02 am

Change Detect Capture(CDC) Stage overview

Post by Amin »

My job Design is Like

Code: Select all

Source1==>SortStage1==>
                        CDC Stage ==> SortStage3 ==> Target
Source2==>SortStage2==>
My questions are;
1: As CDC has built in option regarding sorting data so, before CDC using Sort stage is bad approach and is this performance overhead?
2: Is it CDC Stage output data is sorted or not?
3: Is we need to use Sort Stage as using in above design SortStage3 after CDC for sorting data or output data is already sorted?
4: Is CDC stage compare data in parallel or sequential mode.If more than one server available?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1. Input link sorting is identical to Sort stage except that Sort stage gives more flexibility (for example allocation of memory to sort, generation of key change column).

2. Probably, since its input is sorted. There's nothing within the stage to change the sorted order of rows processed. However, if you re-partition downstream of the stage, all bets are off.

3. See 2.

4. Parallel (irrespective of the number of servers available). So you must ensure that your data are correctly partitioned.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Amin
Premium Member
Premium Member
Posts: 27
Joined: Fri Oct 24, 2014 10:02 am

Post by Amin »

#1,#2: Can you kindly provide some detail.

#4: If CDC stage execution is parallel then how it compare data If same KEY records of before data set is on one server and after data set record is on other server.
Kindly review this.....

"ray.wurlod" Thanks for reply
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

More detail? Not really. All sorting in DataStage parallel jobs uses the tsort operator. The Sort stage gives more options than input link sorting and than inserted tsort operators.

The data have to come together (on the same server) to be processed by the stage. Correct partitioning will guarantee key adjacency, so change can reliably be detected because both (all) relevant records will be together.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply