CDC is capable without primary key

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sureshchandra
Participant
Posts: 92
Joined: Mon May 07, 2007 4:26 am

CDC is capable without primary key

Post by sureshchandra »

Hi all,
My source is DB2 and Target is Oracle.In my source we have no primary key(PK is defined by human).
Some table have UNIQUE KEY but there are transaction that update rows at these UNIQUE KEY's fields .

Each table is having 100 to 250 million rows with different operations like insert,update and delete.

Please help :

1.Can DataStage capture datachanges when the source table is updated at PRIMARY KEY ?
2.Can Datastage config to use RRN(Row Related Number) to identify rows when replicating data changes from DB2/iSeries to Oracle ( Using Reading Transaction Journal method) .If it can , how can we config ?

Thanks in advance
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

Check about 'Transformation Server' tool which is now called as Infosphere CDC (change data capture). This component reads REDO logs and metadata tables and does replication/mirroring. No need of primary key for this replication process.
Kandy
_________________
Try and Try again…You will succeed atlast!!
sureshchandra
Participant
Posts: 92
Joined: Mon May 07, 2007 4:26 am

Post by sureshchandra »

Hi Kandy,
Any other option with Datastage.

Thanks
parag.s.27
Participant
Posts: 221
Joined: Fri Feb 17, 2006 3:38 am
Location: India
Contact:

Post by parag.s.27 »

sureshchandra wrote:Hi Kandy,
Any other option with Datastage.

Thanks
There are two options for processing Delta Data. These are CDC (Change Data Capture) and SCD (Slowly Changing Dimension). Both are almost same as per functionality but SCD is faster as far as processing speed is concerned.

But there is a problem with both these stages. The input records must not have multiple records with same Key. For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. If you feed these records as an input to CDC or SCD then only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy. So you need to keep in mind the data scenario.
Thanks & Regards
Parag Saundattikar
Certified for Infosphere DataStage v8.0
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

There are two options for processing Delta Data. These are CDC (Change Data Capture) and SCD (Slowly Changing Dimension). Both are almost same as per functionality but SCD is faster as far as processing speed is concerned.

But there is a problem with both these stages. The input records must not have multiple records with same Key. For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. If you feed these records as an input to CDC or SCD then only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy. So you need to keep in mind the data scenario.
Suresh was asking for an option which does not use KEYs for finding the change/delta.
Kandy
_________________
Try and Try again…You will succeed atlast!!
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

Any other option with Datastage.
Suresh, Can you please explain how you can manually identify a change for a source record in your scenario? If you take a look at a record from source, will you be able to tell that the record is insert/update or a delete? If yes, tell us how you can.. So we can try to suggest you the same logic in DS.
Kandy
_________________
Try and Try again…You will succeed atlast!!
etldwh_techie2050
Participant
Posts: 9
Joined: Mon Jun 29, 2009 4:27 pm

Post by etldwh_techie2050 »

parag.s.27 wrote: There are two options for processing Delta Data. These are CDC (Change Data Capture) and SCD (Slowly Changing Dimension). Both are almost same as per functionality but SCD is faster as far as processing speed is concerned.

But there is a problem with both these stages. The input records must not have multiple records with same Key. For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. If you feed these records as an input to CDC or SCD then only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy. So you need to keep in mind the data scenario.
Sr. forum members may think this is a topic hijack, sorry for that but it's related...

Since you have mentioned here about issues that SCD & CDC has when there are multiple records for the Client_Id. Is there way around such cases? If yes, what you recommend developer to do? Please describe.

Thanks in advance Parag!
muito obrigado!
etldwh_techie2050
algfr
Participant
Posts: 106
Joined: Fri Sep 09, 2005 7:42 am

Post by algfr »

etldwh_techie2050 wrote:
parag.s.27 wrote: There are two options for processing Delta Data. These are CDC (Change Data Capture) and SCD (Slowly Changing Dimension). Both are almost same as per functionality but SCD is faster as far as processing speed is concerned.

But there is a problem with both these stages. The input records must not have multiple records with same Key. For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. If you feed these records as an input to CDC or SCD then only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy. So you need to keep in mind the data scenario.
Sr. forum members may think this is a topic hijack, sorry for that but it's related...

Since you have mentioned here about issues that SCD & CDC has when there are multiple records for the Client_Id. Is there way around such cases? If yes, what you recommend developer to do? Please describe.

Thanks in advance Parag!
Did you try the lookup stage ? Other than that is there a "natural" key you could use ?
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

This question is for Enterprise/PX and looking up on target will not work as lookup is not dynamically updated.

If you really don't want duplicates in target switch to server methodology like comparing each record with its previous record in transformer using stage variables, assign each record a count, send only first record to CDC and bypass others.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Forgot to mention

it is hijacking. orignal poster didn't had a key but you have. Although you have duplicates based on natural key.

Also if records can be deleted in source then you need to handle it differently.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
etldwh_techie2050
Participant
Posts: 9
Joined: Mon Jun 29, 2009 4:27 pm

Post by etldwh_techie2050 »

To Kunal, since it's a related discussion, I would opt for posting my queries in this thread itself rather than creating lots of threads under same topic. What do you say? :D :D :D

Also, you have not address the problem that Parag has mentioned in his reply which is as follows....

<B>But there is a problem with both these stages.The input records must not have multiple records with same Key. </B> For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. <I>If you feed these records as an input to CDC or SCD then <B>only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy.</B> So you need to keep in mind the data scenario. </I>

How do we have to deal in this case? if there are 3 records with same Client_Id and all of them change in some columns and as per parag the issue is only 1st record will be processed properly and rest 2 records will be inserts.

So, will there be 5 records after SCD or CDC in this case? How to avoid this Kunal or Parag? What u experts recommend here?
muito obrigado!
etldwh_techie2050
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

etldwh_techie2050 wrote:To Kunal, since it's a related discussion, I would opt for posting my queries in this thread itself rather than creating lots of threads under same topic. What do you say?
Related != same. We ask people to start their own topics even for allegedly 'related' questions for multiple reasons, including: you record your specific O/S and version, you can mark the thread as 'Resolved' when you are satisfied and we don't end up with (potentially) multiple tangential conversations going on between different groups of people in a single thread, which can get rather... messy.

:!: So if someone feels the need to chime in with 'the same problem' or a 'related' problem to an existing post, as a common courtesy start a new topic and include a link to the possibly related post or posts, if you really feel the need. Get the URL from the address bar at the top of the screen and either paste it directly into your message or use the [URL] tags to embed it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
etldwh_techie2050
Participant
Posts: 9
Joined: Mon Jun 29, 2009 4:27 pm

Post by etldwh_techie2050 »

Fair enough, Chulett :D As you pointed out I am going to chime on 'the same problem' in new thread. See you there. Peace!
muito obrigado!
etldwh_techie2050
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Appreciate it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply