change capture in server jobs

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sainath
Premium Member
Premium Member
Posts: 138
Joined: Fri Nov 19, 2004 3:57 pm

change capture in server jobs

Post by sainath »

HI
I want load only delta data(change) records into target.Is there any other way other than making target as lookup and update records.
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Hi Sainath

There is no direct stage which will provide you the change capture in Server jobs. You do have Change Capture stage in parallel.
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

Hi Sainath,

May we know the reason behind looking for an alternative approach than using the target as a look-up. Is it performance? Why exactly are you looking for a different method?

Thanks,
Naveen.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

I didn't find a better way other than looking up on target data to do the delta load. Even I am curious to find if there is another better way to do it. And yeah ..for performance.

Kris.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Without something upstream feeding deltas to you - like the CDC process built into Oracle or other third party utilities that provide similar functionality - then you'll have to do all the work of Change Data Detection in your jobs on your own.

And yes, this involves preloaded hashed files of current keys and values to determine via a lookup if a given record is new or already exists in the database, and if existing, if anything you care about in it has changed.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Thats the way I am doing it right now. I am loading the lastupdated timestamp from the target into a hashed file and then compare the lastupdated timestamp inthe source against the timestamp in the hashed file and then loading(update/insert) the delta data based upon the records found in target or not.
Without something upstream feeding deltas to you - like the CDC process built into Oracle
Craig, can you elaborate on that. I didnt seem to get it and would like to know about it.

Kris.
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Kris

Here is the brief explanation of Oralce CDC from the oracle website. For more information refer Oracle Manual.


Oracle9i Database introduced the Change Data Capture (CDC) feature. CDC captures all the inserts, updates, and deletes made to user tables. These changes are stored in a database object called a change table, and the change data is made available to applications in a controlled way through what is called a subscriber view.

Prior to Oracle 10g, the change data was synchronously fed from the source. In Oracle Database 10g, it becomes possible to asynchronously feed the change data, and there are multiple options for how this feed is obtained and managed. Asynchronous data capture reads the changed data from Oracle redo log files. As a result of this, dependence on the transaction activity is avoided and performance overhead is reduced. Oracle Streams Methodology is used to capture change data from redo log files.

Asynchronous CDC requires a streams configuration for each CDC change set. These configurations include the streams capture and apply processes that read change data from redo log files and inserts it into CDC change tables.

Asynchronous CDC requires both the source and staging databases to be Oracle Database 10g.


Thanks
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The gist of it is in Rasi's post. Basically, you are on the receiving end of every insert, update and delete from the source system. You can then use that information to drive the work you need to do in your target.

Problem is, you can't 100 percent rely on what 'action' you get from CDC. In some situations, something called an Insert from the CDC process could actually be an Update in your target database. And just because a record was updated, triggering the CDC process, it doesn't necessarily mean anything actually changed on the record... or if something did change, that it is a field that you care about. Deletes are only deletes if you have the original record 'on file' in the target.

You basically still end up having to do the work mentioned previously to handle everything properly. CDC just typically helps constrain the amount of data you get to work with - you don't need to try and figure out what's new or changed.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply