SCD stage functionality

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jgajardo
Premium Member
Premium Member
Posts: 15
Joined: Thu Sep 23, 2004 9:16 am
Contact:

SCD stage functionality

Post by jgajardo »

Hello everyone,

Long time listener... first time caller ... :wink:

I've spent the last couple of days trying to master
the behaviour of the SCD stage in order to implement
its usage on our spanking new project.

I followed the recipe from "IBM InfoSphere DataStage Data Flow and Job Design"
http://www.redbooks.ibm.com/redbooks/pdfs/sg247576.pdf
Section 2.14 Slowly Changing Dimension (pages 113 to 126).

Here's what I have so far:

Code: Select all

         Current                         Current  : Dimension reference records
            |                            New      : New "potential" dimension records 
            |                            SCD      : SCD stage (contains the surrogate key)
New--------SCD-------Dimension           Dimension: Dimension upsert records (update and insert)
            |                            Output   : Combination of data from "New" and "Current"
            |
          Output
I got everything working :) and here's an example of the results:

Code: Select all

*************
*    NEW    *
*************
                business_key    type_1        type_2         effective_date
                COLOR_CODE      COLOR_NAME    COLOR_GROUP    EFFECT_DATE   
                ------------    ----------    -----------    --------------
                FFFFFF          TRUE WHITE    LIGHT          2001-01-01    

				
*************
*  CURRENT  *
*************
surrogate_key    business_key    type_1        type_2         effective_date    expiry_date    current_indicator
COLOR_ID         COLOR_CODE      COLOR_NAME    COLOR_GROUP    EFFECT_DATE       EXPIR_DATE     CURRENT_IND
-------------    ------------    ----------    -----------    --------------    -----------    --------------
000000001        FFFFFF          WHITE         CLEAR          2000-01-01        2999-12-31     Y


Results of the change:

Code: Select all

*************************
*  DIMENSION  RESULT    *
*************************
surrogate_key    business_key    type_1        type_2         effective_date    expiry_date    current_indicator
COLOR_ID         COLOR_CODE      COLOR_NAME    COLOR_GROUP    EFFECT_DATE       EXPIR_DATE     CURRENT_IND
-------------    ------------    ----------    -----------    --------------    -----------    --------------
000000001        FFFFFF          WHITE         CLEAR          2000-01-01        2001-01-01     N
000000002        FFFFFF          TRUE WHITE    LIGHT          2001-01-01        2999-12-31     Y
My concern is mostly with the results. I need the SCD1 column complete history to be updated like this:

Code: Select all

*************************
*  DESIRED  RESULT      *
*************************
surrogate_key    business_key    type_1        type_2         effective_date	expiry_date    current_indicator
COLOR_ID         COLOR_CODE      COLOR_NAME    COLOR_GROUP    EFFECT_DATE       EXPIR_DATE     CURRENT_IND
-------------    ------------    ----------    -----------    --------------    -----------    --------------
000000001        FFFFFF          TRUE WHITE    CLEAR          2000-01-01        2001-01-01     N
000000002        FFFFFF          TRUE WHITE    LIGHT          2001-01-01        2999-12-31     Y
COLOR_NAME (type 1) column has to have the same value ("TRUE WHITE") for ALL records,
as opposed to having only the last record value updated.

I understand that the update that comes out of the SCD stage
is also used to expire the record, hence it is based on the surrogate key.

We could debate on the nature of the SCD1 column update (and believe me I have)
but you can find an explanation of what I need to do (by Ralph Kimball) here:
http://www.kimballgroup.com/html/design ... ingSCD.pdf


**************************************************
I need to be able to do an update based on the business key
to the entire history of the column when a change is detected.
**************************************************


Here are my 3 possible solutions using the SCD stage:


SOLUTION 1: Use the OUTPUT to detect the changes


By comparing the CURRENT value of the SCD type 1 column with the NEW value
I could create a new update record based on the business key.


PROBLEM: The output map doesn't return the previous value of the dimension (prior to the change).
It seems to return the value after the change (from memory).


This is what I see (same value for NEW and CURRENT):

Code: Select all

*************************
*        OUTPUT         *
*************************

NEW             NEW           CURRENT         CURRENT
business_key    type_1        business_key    type_1    
COLOR_CODE      COLOR_NAME    COLOR_CODE      COLOR_NAME
------------    ----------    ------------    ----------
FFFFFF          TRUE WHITE    FFFFFF          TRUE WHITE
This is what I would like to see:

Code: Select all

*************************
*    DESIRED OUTPUT     *
*************************

NEW             NEW           CURRENT         CURRENT
business_key    type_1        business_key    type_1    
COLOR_CODE      COLOR_NAME    COLOR_CODE      COLOR_NAME
------------    ----------    ------------    ----------
FFFFFF          TRUE WHITE         FFFFFF          WHITE 
Question 1: Is this a bug, or is the OUTPUT MAP supposed to work this way?:?

_____________________________________________________________________________________________________

SOLUTION 2: Use a "LOOKUP FILE SET" or "DATA SET" for the reference lookup (CURRENT)


By using a "LOOKUP FILE SET" or "DATA SET" for the CURRENT,
we could reuse it in the comparison of the DimUpd records from the SCD stage.

This would allow me to construct a second update file based on the business key.

PROBLEM: I cannot get the LOOKUP FILE SET to work.:?

Question 2: What stages work as a "reference lookup" for the SCD stage?

_____________________________________________________________________________________________________

SOLUTION 3: Reuse de UpdDim records to do a blind update on the SCD type 1 columns based on the business key


This is probably the simplest to code. All records that make it to the Upsert
are reused to update the SCD Type 1 columns without validating if a change took place on the SCD Type 1 column.

PROBLEM: I would be doing unnecessary updates when the values for the SCD type 1 column haven't changed,
which is probably most of the time :roll:

Question 3: Any other ideas on how to do this with the SCD stage?
_____________________________________________________________________________________________________


One final thought; I am aware that there methods of doing this wihout an SCD stage ...
but it would defeat the purpose of having an SCD stage if we don't use it


Thank you for your time!
Julio E. Gajardo

Life is what happens to you while your busy making other plans - John Lennon
jgajardo
Premium Member
Premium Member
Posts: 15
Joined: Thu Sep 23, 2004 9:16 am
Contact:

Re: SCD stage functionality

Post by jgajardo »

Figured this out! :D (thx Norman :idea: )
jgajardo wrote:Hello everyone,

PROBLEM: The output map doesn't return the previous value of the dimension (prior to the change).
It seems to return the value after the change (from memory).

[...]

Question 1: Is this a bug, or is the OUTPUT MAP supposed to work this way?:?
It is a feature and not a bug. :oops:

When you define the column as "Type 1" the output map returns the changed value in the column.
If you want to see the old value, you need to pass the column again in the reference.

This second definition of the column must not be defined as a "type 1".

ie: You don't pass it to the upsert, it goes from the reference lookup to the output map.

So the result in the ouput map is:

Code: Select all

NEW             NEW           CURRENT         CURRENT       CURRENT
business_key    type_1        business_key    type_1        (no purpose code)
COLOR_CODE      COLOR_NAME    COLOR_CODE      COLOR_NAME    COLOR_NAME_PREVIOUS
------------    ----------    ------------    ----------    -------------------
FFFFFF          TRUE WHITE    FFFFFF          TRUE WHITE    WHITE
I can now filter the records that were actually updated as SCD1
and create new updates based on the business key. 8)
Julio E. Gajardo

Life is what happens to you while your busy making other plans - John Lennon
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Very nice - and excellent formatting as well. :D
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply