Transactional Data and Change Capture

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sbuk
Participant
Posts: 4
Joined: Fri Mar 27, 2009 1:22 pm

Transactional Data and Change Capture

Post by sbuk »

We are building a type 2 data warehouse where we expect multiple transactional updates from multiple source systems combined into one file by source system for end of day batch processing. The data architure is such that it is required to maintain each transaction history with the latest transaction as the only effective record and all others including historical transactions expired.

The problem I am trying resolve is to design the data integration process where one historical record can be compared to multiple input records in one data pass. A simple POC using the change capture stage produced the following:

Before Data
KEY|ChangeValue
P1|ABC123

After Data
KEY|ChangeValue|TransactionID
P1|DEF123|1
P1|ABC123|2
P1|XYZ789|3

The idea is to compare the record sets as whole rather than history to txn1, then txn1 to txn2, and finally txn2 to txn3. The expectation was the Change Capture would assign change code 3 to txn1 and txn3 and drop txn2. Then we would take the highest txnID (txnID3) and keep that effective and expire all others. This is not how it behaved.

Results of Change Capture Stage
KEY|ChangeValue|ChangeCode
P1|DEF123|3
P1|XYZ789|1

I am assuming the stage works in a way that after the first match it discards the old key and hence assigns a change code of 1 to txn3.

I may be overcomplicating the concept or have a completely wrong approach. I would sincerely appreciate anyone's feedback.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard. What are you comparing against in the Change Capture stage? How are you ensuring that, for example, txn1 has been committed into that data set when processing txn2?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Trasactional Data and Change Capture

Post by SURA »

Is it something like rapidly changing dimension?

You have more than one record in the source for the same key?

DS User
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Re: Trasactional Data and Change Capture

Post by BI-RMA »

sbuk wrote:I am assuming the stage works in a way that after the first match it discards the old key and hence assigns a change code of 1 to txn3.
You are right. In change-data-capture each row from the before-stream is compared to the after-stream exactly once. Your expected result can be achieved by providing the same before-row three times (try that by using a row-generator-stage).

By the way: your actual result can only be achieved using some strange partioning options on the before and after datasets. Otherwise I would expect (without sorting by ChangeValue):

KEY|ChangeValue|ChangeCode
P1|DEF123|3
P1|ABC123|1
P1|XYZ789|1
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

To get what you want to achieve:
Sort after-dataset by KEY and TransactionID and remove duplicates on column KEY afterwards before Change-Capture. Keep the record with the highest value for TransactionID.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
sbuk
Participant
Posts: 4
Joined: Fri Mar 27, 2009 1:22 pm

Post by sbuk »

ray.wurlod wrote:Welcome aboard. What are you comparing against in the Change Capture stage? How are you ensuring that, for example, txn1 has been committed into that data set when processing txn2? ...
Thanks.

To answer your question, I am not ensuring txn1 is commited before processing, that's part of the challenge.
sbuk
Participant
Posts: 4
Joined: Fri Mar 27, 2009 1:22 pm

Post by sbuk »

After giving it some more thought I realized one flaw in the scenario I provided.

Before Data
KEY|ChangeValue
P1|ABC123

After Data
KEY|ChangeValue|TransactionID
P1|DEF123|1
P1|ABC123|2
P1|XYZ789|3

For txn2, we shouldn't be dropping the record as this is valid business change. The change for the ChangeValue column is from DEF123 to ABC123, so comparing to the historical value of ABC123 would be incorrect logic. Likewise removing duplicates would also discard business data which would be incorrect.

However, it is still a challenge to achieve the following result using a change capture stage.

KEY|ChangeValue|TransactionID|ChangeCode|CurrentRecord
P1|DEF123|1|3|N
P1|ABC123|2|3|N
P1|XYZ789|3|3|Y

I think I can achieve this result using a Transformer using sorted data on Key and Transaction ID and implement change compare logic in stage variables.

I don't want to give up on ChangeCapture stage, all your help is appreciated.
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

sbuk wrote:However, it is still a challenge to achieve the following result using a change capture stage.
I agree. Big one. :)
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
Post Reply