Page 1 of 1

CDC is capable without primary key

Posted: Thu Jan 22, 2009 4:27 am
by sureshchandra
Hi all,
My source is DB2 and Target is Oracle.In my source we have no primary key(PK is defined by human).
Some table have UNIQUE KEY but there are transaction that update rows at these UNIQUE KEY's fields .

Each table is having 100 to 250 million rows with different operations like insert,update and delete.

Please help :

1.Can DataStage capture datachanges when the source table is updated at PRIMARY KEY ?
2.Can Datastage config to use RRN(Row Related Number) to identify rows when replicating data changes from DB2/iSeries to Oracle ( Using Reading Transaction Journal method) .If it can , how can we config ?

Thanks in advance

Posted: Thu Jan 22, 2009 9:37 am
by kandyshandy
Check about 'Transformation Server' tool which is now called as Infosphere CDC (change data capture). This component reads REDO logs and metadata tables and does replication/mirroring. No need of primary key for this replication process.

Posted: Thu Jan 22, 2009 8:35 pm
by sureshchandra
Hi Kandy,
Any other option with Datastage.

Thanks

Posted: Thu Jan 22, 2009 11:05 pm
by parag.s.27
sureshchandra wrote:Hi Kandy,
Any other option with Datastage.

Thanks
There are two options for processing Delta Data. These are CDC (Change Data Capture) and SCD (Slowly Changing Dimension). Both are almost same as per functionality but SCD is faster as far as processing speed is concerned.

But there is a problem with both these stages. The input records must not have multiple records with same Key. For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. If you feed these records as an input to CDC or SCD then only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy. So you need to keep in mind the data scenario.

Posted: Fri Jan 23, 2009 9:29 am
by kandyshandy
There are two options for processing Delta Data. These are CDC (Change Data Capture) and SCD (Slowly Changing Dimension). Both are almost same as per functionality but SCD is faster as far as processing speed is concerned.

But there is a problem with both these stages. The input records must not have multiple records with same Key. For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. If you feed these records as an input to CDC or SCD then only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy. So you need to keep in mind the data scenario.
Suresh was asking for an option which does not use KEYs for finding the change/delta.

Posted: Fri Jan 23, 2009 9:34 am
by kandyshandy
Any other option with Datastage.
Suresh, Can you please explain how you can manually identify a change for a source record in your scenario? If you take a look at a record from source, will you be able to tell that the record is insert/update or a delete? If yes, tell us how you can.. So we can try to suggest you the same logic in DS.

Posted: Sun Jul 19, 2009 2:51 pm
by etldwh_techie2050
parag.s.27 wrote: There are two options for processing Delta Data. These are CDC (Change Data Capture) and SCD (Slowly Changing Dimension). Both are almost same as per functionality but SCD is faster as far as processing speed is concerned.

But there is a problem with both these stages. The input records must not have multiple records with same Key. For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. If you feed these records as an input to CDC or SCD then only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy. So you need to keep in mind the data scenario.
Sr. forum members may think this is a topic hijack, sorry for that but it's related...

Since you have mentioned here about issues that SCD & CDC has when there are multiple records for the Client_Id. Is there way around such cases? If yes, what you recommend developer to do? Please describe.

Thanks in advance Parag!

Posted: Mon Jul 20, 2009 6:09 am
by algfr
etldwh_techie2050 wrote:
parag.s.27 wrote: There are two options for processing Delta Data. These are CDC (Change Data Capture) and SCD (Slowly Changing Dimension). Both are almost same as per functionality but SCD is faster as far as processing speed is concerned.

But there is a problem with both these stages. The input records must not have multiple records with same Key. For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. If you feed these records as an input to CDC or SCD then only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy. So you need to keep in mind the data scenario.
Sr. forum members may think this is a topic hijack, sorry for that but it's related...

Since you have mentioned here about issues that SCD & CDC has when there are multiple records for the Client_Id. Is there way around such cases? If yes, what you recommend developer to do? Please describe.

Thanks in advance Parag!
Did you try the lookup stage ? Other than that is there a "natural" key you could use ?

Posted: Mon Jul 20, 2009 6:29 am
by priyadarshikunal
This question is for Enterprise/PX and looking up on target will not work as lookup is not dynamically updated.

If you really don't want duplicates in target switch to server methodology like comparing each record with its previous record in transformer using stage variables, assign each record a count, send only first record to CDC and bypass others.

Posted: Mon Jul 20, 2009 6:32 am
by priyadarshikunal
Forgot to mention

it is hijacking. orignal poster didn't had a key but you have. Although you have duplicates based on natural key.

Also if records can be deleted in source then you need to handle it differently.

Posted: Mon Jul 20, 2009 2:29 pm
by etldwh_techie2050
To Kunal, since it's a related discussion, I would opt for posting my queries in this thread itself rather than creating lots of threads under same topic. What do you say? :D :D :D

Also, you have not address the problem that Parag has mentioned in his reply which is as follows....

<B>But there is a problem with both these stages.The input records must not have multiple records with same Key. </B> For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. <I>If you feed these records as an input to CDC or SCD then <B>only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy.</B> So you need to keep in mind the data scenario. </I>

How do we have to deal in this case? if there are 3 records with same Client_Id and all of them change in some columns and as per parag the issue is only 1st record will be processed properly and rest 2 records will be inserts.

So, will there be 5 records after SCD or CDC in this case? How to avoid this Kunal or Parag? What u experts recommend here?

Posted: Mon Jul 20, 2009 3:56 pm
by chulett
etldwh_techie2050 wrote:To Kunal, since it's a related discussion, I would opt for posting my queries in this thread itself rather than creating lots of threads under same topic. What do you say?
Related != same. We ask people to start their own topics even for allegedly 'related' questions for multiple reasons, including: you record your specific O/S and version, you can mark the thread as 'Resolved' when you are satisfied and we don't end up with (potentially) multiple tangential conversations going on between different groups of people in a single thread, which can get rather... messy.

:!: So if someone feels the need to chime in with 'the same problem' or a 'related' problem to an existing post, as a common courtesy start a new topic and include a link to the possibly related post or posts, if you really feel the need. Get the URL from the address bar at the top of the screen and either paste it directly into your message or use the [URL] tags to embed it.

Posted: Mon Jul 20, 2009 5:54 pm
by etldwh_techie2050
Fair enough, Chulett :D As you pointed out I am going to chime on 'the same problem' in new thread. See you there. Peace!

Posted: Mon Jul 20, 2009 9:13 pm
by chulett
Appreciate it.