My questions are;
1: As CDC has built in option regarding sorting data so, before CDC using Sort stage is bad approach and is this performance overhead?
2: Is it CDC Stage output data is sorted or not?
3: Is we need to use Sort Stage as using in above design SortStage3 after CDC for sorting data or output data is already sorted?
4: Is CDC stage compare data in parallel or sequential mode.If more than one server available?
1. Input link sorting is identical to Sort stage except that Sort stage gives more flexibility (for example allocation of memory to sort, generation of key change column).
2. Probably, since its input is sorted. There's nothing within the stage to change the sorted order of rows processed. However, if you re-partition downstream of the stage, all bets are off.
3. See 2.
4. Parallel (irrespective of the number of servers available). So you must ensure that your data are correctly partitioned.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
#4: If CDC stage execution is parallel then how it compare data If same KEY records of before data set is on one server and after data set record is on other server.
Kindly review this.....
More detail? Not really. All sorting in DataStage parallel jobs uses the tsort operator. The Sort stage gives more options than input link sorting and than inserted tsort operators.
The data have to come together (on the same server) to be processed by the stage. Correct partitioning will guarantee key adjacency, so change can reliably be detected because both (all) relevant records will be together.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.