Page 1 of 1

Change data capture

Posted: Mon Aug 26, 2013 1:15 pm
by mydsworld
Looking for some advise on best practise to capture 'changed data' for huge data volume. Assuming I have a Base table with huge data volume and I get a full dump of the source data in some Staging table. So, will the 'Change Capture' stage be good enough (performance wise) to find the changed data (was told it is not). If not, what is the other alternatives.

Thanks.

Posted: Mon Aug 26, 2013 2:23 pm
by suryadev
Hello

The CDC stage can process huge amount of data. Select the key fields from base table and also the staging table.
There can be three links coming out from CDC stage inserts, updates and deletes.

Posted: Mon Aug 26, 2013 4:53 pm
by ray.wurlod
Be very careful with terminology. CDC and Change Capture are quite different stage types.

Re: Change data capture

Posted: Mon Aug 26, 2013 6:00 pm
by SURA
mydsworld wrote:'Change Capture' stage be good enough (performance wise)
Last year i did the test especially in relates with performance and i can't find any notable difference between stages.
I tried with Change Capture, SCD and Full Outer Join. I passed 100,000 records for test and i was not able to find any big difference in performance.

NOTES: Consider Rays comment which is totally different.

Posted: Mon Aug 26, 2013 6:30 pm
by mydsworld
I am talking about the 'Change Capture' stage in DS. Is it anyway detrimental to performance when it comes huge data volume.

Posted: Mon Aug 26, 2013 6:38 pm
by ray.wurlod
The Change Capture stage takes advantage of the fact that the incoming data are sorted on the change key values, so that it only needs to be processing on change key value at a time in memory. Therefore there is no added detriment to the linear scaling you should see as you increase the number of rows. And even this can be reduced by adding nodes and correctly partitioning the incoming data.