Issues with SCD & CDC?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
etldwh_techie2050
Participant
Posts: 9
Joined: Mon Jun 29, 2009 4:27 pm

Issues with SCD & CDC?

Post by etldwh_techie2050 »

experts,
This thread is a continual discussion on earlier thread. Please refer to the following topic...
viewtopic.php?p=334305#334305

Can you please address the problem that Parag has mentioned in his reply which is as follows....
But there is a problem with both these stages.The input records must not have multiple records with same Key. For e.g. if your key is a Client_Id and there is a case when same Client does different multiple transactions then in your system there will be multiple records with same Client_Id. If you feed these records as an input to CDC or SCD then only first record is processed correctly and rest all are directly tagged as Inserts (Change_Code = 1) and are sent further even though these were exact copy. So you need to keep in mind the data scenario.
How do we have to deal in this case? if there are 3 records with same Client_Id and all of them change in some columns and as per parag the issue is only 1st record will be processed properly and rest 2 records will be inserts. So, will there be 5 records after SCD or CDC in this case? How to avoid this Kunal or Parag? What u experts recommend here?
muito obrigado!
etldwh_techie2050
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I would think that your business rules re: duplicate records would drive the solution. Are all processed? Only the first one encountered? Only the last / most recent record? The answer would affect the solution.
-craig

"You can never have too many knives" -- Logan Nine Fingers
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Probably you want to process all the records. But if you send all the transactions to compare with all transaction in target is not going to work. as it should have proper keys to uniquely identify each record.

even if you are comparing 3 transactions in source and 1 transaction in target then it should not compare each source record with the traget record in case of SCD (in general datawarehouse term and not related to stage). You need to compare each record with its previous record.

Its possible to overcome this problem in delta processing and is near to impossible in full comparison (mind the word "near").

But all depends on the answer you are giving to questions asked by Craig.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
etldwh_techie2050
Participant
Posts: 9
Joined: Mon Jun 29, 2009 4:27 pm

Post by etldwh_techie2050 »

Chulett, all are processed to keep history information and displayed the same using web interfaces.
muito obrigado!
etldwh_techie2050
etldwh_techie2050
Participant
Posts: 9
Joined: Mon Jun 29, 2009 4:27 pm

Post by etldwh_techie2050 »

kunal, thanks for your reply.
muito obrigado!
etldwh_techie2050
jcthornton
Premium Member
Premium Member
Posts: 79
Joined: Thu Mar 22, 2007 4:58 pm
Location: USA

Post by jcthornton »

The way to deal with this case is very carefully.

As indicated, only the first record for a given key will be matched. All others will come through as Inserts. You will need to identify the duplicate keys outside of the CDC/SCD stage.

One approach is to do this in a Transform stage.

1. Sort and partition the inputs by the keys. Have additional sort (but not partition) criteria on fields that determine the order records were created. These additional sorting fields are usually in descending order (newest first)

2. On change of key (use stage variables to determine this), determine your fields as for a new insert/most recent Type 2 record.

3. On a repeated key, apply your derivations for superseded records.

Rinse and repeat until done.

I suggest sorting in descending order by date because usually the derivations are best determined that way. If your logic goes the other way, reverse the sort as needed.

If it should happen that both the prior record (if exists) and the following record (if exists) are needed for calculations, it will take two transformers in order with reversed sorts.

As always, there are plenty of ways to complicate a design, especially when building off of partial explanations and hypothetical cases.
Jack Thornton
----------------
Spectacular achievement is always preceded by spectacular preparation - Robert H. Schuller
Post Reply