CDC is capable without primary key
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 92
- Joined: Mon May 07, 2007 4:26 am
CDC is capable without primary key
Hi all,
My source is DB2 and Target is Oracle.In my source we have no primary key(PK is defined by human).
Some table have UNIQUE KEY but there are transaction that update rows at these UNIQUE KEY's fields .
Each table is having 100 to 250 million rows with different operations like insert,update and delete.
Please help :
1.Can DataStage capture datachanges when the source table is updated at PRIMARY KEY ?
2.Can Datastage config to use RRN(Row Related Number) to identify rows when replicating data changes from DB2/iSeries to Oracle ( Using Reading Transaction Journal method) .If it can , how can we config ?
Thanks in advance
My source is DB2 and Target is Oracle.In my source we have no primary key(PK is defined by human).
Some table have UNIQUE KEY but there are transaction that update rows at these UNIQUE KEY's fields .
Each table is having 100 to 250 million rows with different operations like insert,update and delete.
Please help :
1.Can DataStage capture datachanges when the source table is updated at PRIMARY KEY ?
2.Can Datastage config to use RRN(Row Related Number) to identify rows when replicating data changes from DB2/iSeries to Oracle ( Using Reading Transaction Journal method) .If it can , how can we config ?
Thanks in advance
-
- Participant
- Posts: 597
- Joined: Fri Apr 29, 2005 6:19 am
- Location: Singapore
Check about 'Transformation Server' tool which is now called as Infosphere CDC (change data capture). This component reads REDO logs and metadata tables and does replication/mirroring. No need of primary key for this replication process.
Kandy
_________________
Try and Try again…You will succeed atlast!!
_________________
Try and Try again…You will succeed atlast!!
-
- Participant
- Posts: 92
- Joined: Mon May 07, 2007 4:26 am
-
- Participant
- Posts: 221
- Joined: Fri Feb 17, 2006 3:38 am
- Location: India
- Contact:
There are two options for processing Delta Data. These are CDC (Change Data Capture) and SCD (Slowly Changing Dimension). Both are almost same as per functionality but SCD is faster as far as processing speed is concerned.sureshchandra wrote:Hi Kandy,
Any other option with Datastage.
Thanks
But there is a problem with both these stages. The input records must not have multiple records with same Key. For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. If you feed these records as an input to CDC or SCD then only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy. So you need to keep in mind the data scenario.
Thanks & Regards
Parag Saundattikar
Certified for Infosphere DataStage v8.0
Parag Saundattikar
Certified for Infosphere DataStage v8.0
-
- Participant
- Posts: 597
- Joined: Fri Apr 29, 2005 6:19 am
- Location: Singapore
Suresh was asking for an option which does not use KEYs for finding the change/delta.There are two options for processing Delta Data. These are CDC (Change Data Capture) and SCD (Slowly Changing Dimension). Both are almost same as per functionality but SCD is faster as far as processing speed is concerned.
But there is a problem with both these stages. The input records must not have multiple records with same Key. For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. If you feed these records as an input to CDC or SCD then only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy. So you need to keep in mind the data scenario.
Kandy
_________________
Try and Try again…You will succeed atlast!!
_________________
Try and Try again…You will succeed atlast!!
-
- Participant
- Posts: 597
- Joined: Fri Apr 29, 2005 6:19 am
- Location: Singapore
Suresh, Can you please explain how you can manually identify a change for a source record in your scenario? If you take a look at a record from source, will you be able to tell that the record is insert/update or a delete? If yes, tell us how you can.. So we can try to suggest you the same logic in DS.Any other option with Datastage.
Kandy
_________________
Try and Try again…You will succeed atlast!!
_________________
Try and Try again…You will succeed atlast!!
-
- Participant
- Posts: 9
- Joined: Mon Jun 29, 2009 4:27 pm
Sr. forum members may think this is a topic hijack, sorry for that but it's related...parag.s.27 wrote: There are two options for processing Delta Data. These are CDC (Change Data Capture) and SCD (Slowly Changing Dimension). Both are almost same as per functionality but SCD is faster as far as processing speed is concerned.
But there is a problem with both these stages. The input records must not have multiple records with same Key. For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. If you feed these records as an input to CDC or SCD then only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy. So you need to keep in mind the data scenario.
Since you have mentioned here about issues that SCD & CDC has when there are multiple records for the Client_Id. Is there way around such cases? If yes, what you recommend developer to do? Please describe.
Thanks in advance Parag!
muito obrigado!
etldwh_techie2050
etldwh_techie2050
Did you try the lookup stage ? Other than that is there a "natural" key you could use ?etldwh_techie2050 wrote:Sr. forum members may think this is a topic hijack, sorry for that but it's related...parag.s.27 wrote: There are two options for processing Delta Data. These are CDC (Change Data Capture) and SCD (Slowly Changing Dimension). Both are almost same as per functionality but SCD is faster as far as processing speed is concerned.
But there is a problem with both these stages. The input records must not have multiple records with same Key. For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. If you feed these records as an input to CDC or SCD then only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy. So you need to keep in mind the data scenario.
Since you have mentioned here about issues that SCD & CDC has when there are multiple records for the Client_Id. Is there way around such cases? If yes, what you recommend developer to do? Please describe.
Thanks in advance Parag!
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
This question is for Enterprise/PX and looking up on target will not work as lookup is not dynamically updated.
If you really don't want duplicates in target switch to server methodology like comparing each record with its previous record in transformer using stage variables, assign each record a count, send only first record to CDC and bypass others.
If you really don't want duplicates in target switch to server methodology like comparing each record with its previous record in transformer using stage variables, assign each record a count, send only first record to CDC and bypass others.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.![Wink :wink:](./images/smilies/icon_wink.gif)
Genius may have its limitations, but stupidity is not thus handicapped.
![Wink :wink:](./images/smilies/icon_wink.gif)
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
Forgot to mention
it is hijacking. orignal poster didn't had a key but you have. Although you have duplicates based on natural key.
Also if records can be deleted in source then you need to handle it differently.
it is hijacking. orignal poster didn't had a key but you have. Although you have duplicates based on natural key.
Also if records can be deleted in source then you need to handle it differently.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.![Wink :wink:](./images/smilies/icon_wink.gif)
Genius may have its limitations, but stupidity is not thus handicapped.
![Wink :wink:](./images/smilies/icon_wink.gif)
-
- Participant
- Posts: 9
- Joined: Mon Jun 29, 2009 4:27 pm
To Kunal, since it's a related discussion, I would opt for posting my queries in this thread itself rather than creating lots of threads under same topic. What do you say? :D :D :D
Also, you have not address the problem that Parag has mentioned in his reply which is as follows....
<B>But there is a problem with both these stages.The input records must not have multiple records with same Key. </B> For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. <I>If you feed these records as an input to CDC or SCD then <B>only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy.</B> So you need to keep in mind the data scenario. </I>
How do we have to deal in this case? if there are 3 records with same Client_Id and all of them change in some columns and as per parag the issue is only 1st record will be processed properly and rest 2 records will be inserts.
So, will there be 5 records after SCD or CDC in this case? How to avoid this Kunal or Parag? What u experts recommend here?
Also, you have not address the problem that Parag has mentioned in his reply which is as follows....
<B>But there is a problem with both these stages.The input records must not have multiple records with same Key. </B> For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. <I>If you feed these records as an input to CDC or SCD then <B>only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy.</B> So you need to keep in mind the data scenario. </I>
How do we have to deal in this case? if there are 3 records with same Client_Id and all of them change in some columns and as per parag the issue is only 1st record will be processed properly and rest 2 records will be inserts.
So, will there be 5 records after SCD or CDC in this case? How to avoid this Kunal or Parag? What u experts recommend here?
muito obrigado!
etldwh_techie2050
etldwh_techie2050
Related != same. We ask people to start their own topics even for allegedly 'related' questions for multiple reasons, including: you record your specific O/S and version, you can mark the thread as 'Resolved' when you are satisfied and we don't end up with (potentially) multiple tangential conversations going on between different groups of people in a single thread, which can get rather... messy.etldwh_techie2050 wrote:To Kunal, since it's a related discussion, I would opt for posting my queries in this thread itself rather than creating lots of threads under same topic. What do you say?
![Exclamation :!:](./images/smilies/icon_exclaim.gif)
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 9
- Joined: Mon Jun 29, 2009 4:27 pm