Change data capture

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mydsworld
Participant
Posts: 321
Joined: Thu Sep 07, 2006 3:55 am

Change data capture

Post by mydsworld »

Looking for some advise on best practise to capture 'changed data' for huge data volume. Assuming I have a Base table with huge data volume and I get a full dump of the source data in some Staging table. So, will the 'Change Capture' stage be good enough (performance wise) to find the changed data (was told it is not). If not, what is the other alternatives.

Thanks.
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

Hello

The CDC stage can process huge amount of data. Select the key fields from base table and also the staging table.
There can be three links coming out from CDC stage inserts, updates and deletes.
Thanks,
Surya
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Be very careful with terminology. CDC and Change Capture are quite different stage types.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Change data capture

Post by SURA »

mydsworld wrote:'Change Capture' stage be good enough (performance wise)
Last year i did the test especially in relates with performance and i can't find any notable difference between stages.
I tried with Change Capture, SCD and Full Outer Join. I passed 100,000 records for test and i was not able to find any big difference in performance.

NOTES: Consider Rays comment which is totally different.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
mydsworld
Participant
Posts: 321
Joined: Thu Sep 07, 2006 3:55 am

Post by mydsworld »

I am talking about the 'Change Capture' stage in DS. Is it anyway detrimental to performance when it comes huge data volume.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Change Capture stage takes advantage of the fact that the incoming data are sorted on the change key values, so that it only needs to be processing on change key value at a time in memory. Therefore there is no added detriment to the linear scaling you should see as you increase the number of rows. And even this can be reduced by adding nodes and correctly partitioning the incoming data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply