Page 1 of 1

Real time ETL in datastage using CDC transaction stage

Posted: Wed Oct 19, 2016 10:56 am
by dbdecoy
Hi,

I am currently looking for acheiving real time ETL scenario using CDC transaction stage , i have read about this stage , but need few clarifications , using this stage does the datastage job will be online everytime ? also does this stage trigger the datastage job whenever there is a update in source DB ?

For acheving real time ETL , can you please put some suggestions how can we acheive in datastage if not using CDC transaction stage

Note : we currently use IBM Change data capture tool to get real time updates from source , as of now we are loading the data in to our staging area and then running our ETL jobs in batches daily, we are trying to remove the staging area part using this CDC transaction stage to achive the real time ETL process done

Please let me know if you need any further details

Thanks in advance

Re: Real time ETL in datastage using CDC transaction stage

Posted: Mon Oct 24, 2016 3:02 am
by dbdecoy
Hi,

Could anyone please help me on this

Thanks

Posted: Mon Oct 24, 2016 7:15 am
by chulett
Generally, regardless of stage, 'real time' in this context would mean publishing your job as an 'always on' job, a web service. You might get more information by taking a look through the SOA Editions forum here. I would be curious about what kind of volume you'll need to be processing.

Posted: Mon Oct 24, 2016 1:39 pm
by qt_ky
IBM Change data capture tool / IBM InfoSphere Data Replication software comes with a feature that you can use from its user interface to export or generate a DSX file, based on your replication subscription(s). You can import that into a DataStage project by using Designer, Import, and you will find it includes sequence jobs, routines, and various other jobs that will help you get a jump start on replicating the data.

As far as the always-on job question, I got the impression from documentation that it is an option, but I didn't get a chance to test it out.

Posted: Wed Nov 02, 2016 3:59 am
by dbdecoy
Thanks Craig, I will look in to the SOA Editions and will update on this forum. Regarding the volume it should be 3 to 7 Million records.

Posted: Wed Nov 02, 2016 5:02 am
by eostic
Hi. I haven't used this stage extensively since some very early tests when it came out, so cannot vouch for its behavior. I do recall, quite a few years ago, that there was work done to ensure that an end-of-wave was inserted specifically, but I don't know exactly which release or which edition of the integration. I am not an expert on CDC, but recall that the Stage was fairly smart and knew about subscriptions and bookmarks, and made it easier to integrate DataStage with CDC rather than using user exits in CDC to go to something like MQ. If it (still) behaves similar to what I recall, then it uses the CDC's own API to basically "listen" on a connection and receive changes as soon as they are known and sent by CDC. That makes it "always on". Again, this understanding could be outdated. There was another pattern that was also popular with CDC, as someone mentioned above --- where a .dsx and script was provided that did a "smart cycling"....I think reading a sort of "checkpointed" flat file that was cut by CDC. A bit old fashioned, but very reliable, as I recall.

Ernie

Posted: Wed Nov 02, 2016 10:24 am
by dbdecoy
Hi, Currently we are testing the option of always -on job by creating a datastore in CDC to connect to Datastage, but facing some problems due to version mismatch between the CDC tool and Datastage, will update this forum once we show some progress.