Page 1 of 1

change capture in server jobs

Posted: Mon Mar 20, 2006 8:15 pm
by sainath
HI
I want load only delta data(change) records into target.Is there any other way other than making target as lookup and update records.

Posted: Mon Mar 20, 2006 8:18 pm
by rasi
Hi Sainath

There is no direct stage which will provide you the change capture in Server jobs. You do have Change Capture stage in parallel.

Posted: Mon Mar 20, 2006 8:24 pm
by I_Server_Whale
Hi Sainath,

May we know the reason behind looking for an alternative approach than using the target as a look-up. Is it performance? Why exactly are you looking for a different method?

Thanks,
Naveen.

Posted: Mon Mar 20, 2006 10:07 pm
by kris007
I didn't find a better way other than looking up on target data to do the delta load. Even I am curious to find if there is another better way to do it. And yeah ..for performance.

Kris.

Posted: Mon Mar 20, 2006 10:24 pm
by chulett
Without something upstream feeding deltas to you - like the CDC process built into Oracle or other third party utilities that provide similar functionality - then you'll have to do all the work of Change Data Detection in your jobs on your own.

And yes, this involves preloaded hashed files of current keys and values to determine via a lookup if a given record is new or already exists in the database, and if existing, if anything you care about in it has changed.

Posted: Mon Mar 20, 2006 10:35 pm
by kris007
Thats the way I am doing it right now. I am loading the lastupdated timestamp from the target into a hashed file and then compare the lastupdated timestamp inthe source against the timestamp in the hashed file and then loading(update/insert) the delta data based upon the records found in target or not.
Without something upstream feeding deltas to you - like the CDC process built into Oracle
Craig, can you elaborate on that. I didnt seem to get it and would like to know about it.

Kris.

Posted: Mon Mar 20, 2006 10:49 pm
by rasi
Kris

Here is the brief explanation of Oralce CDC from the oracle website. For more information refer Oracle Manual.


Oracle9i Database introduced the Change Data Capture (CDC) feature. CDC captures all the inserts, updates, and deletes made to user tables. These changes are stored in a database object called a change table, and the change data is made available to applications in a controlled way through what is called a subscriber view.

Prior to Oracle 10g, the change data was synchronously fed from the source. In Oracle Database 10g, it becomes possible to asynchronously feed the change data, and there are multiple options for how this feed is obtained and managed. Asynchronous data capture reads the changed data from Oracle redo log files. As a result of this, dependence on the transaction activity is avoided and performance overhead is reduced. Oracle Streams Methodology is used to capture change data from redo log files.

Asynchronous CDC requires a streams configuration for each CDC change set. These configurations include the streams capture and apply processes that read change data from redo log files and inserts it into CDC change tables.

Asynchronous CDC requires both the source and staging databases to be Oracle Database 10g.


Thanks

Posted: Mon Mar 20, 2006 11:11 pm
by chulett
The gist of it is in Rasi's post. Basically, you are on the receiving end of every insert, update and delete from the source system. You can then use that information to drive the work you need to do in your target.

Problem is, you can't 100 percent rely on what 'action' you get from CDC. In some situations, something called an Insert from the CDC process could actually be an Update in your target database. And just because a record was updated, triggering the CDC process, it doesn't necessarily mean anything actually changed on the record... or if something did change, that it is a field that you care about. Deletes are only deletes if you have the original record 'on file' in the target.

You basically still end up having to do the work mentioned previously to handle everything properly. CDC just typically helps constrain the amount of data you get to work with - you don't need to try and figure out what's new or changed.