How to implement typeII slowly changing dimensions in star

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
narsingrp
Premium Member
Premium Member
Posts: 37
Joined: Wed Jan 21, 2004 10:38 pm

How to implement typeII slowly changing dimensions in star

Post by narsingrp »

Hi Guys,

Can anybody give me any idea on how to implement slowly changing dimensions type.II in star schema(dimensional modeling)

Thanks in advance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Step 1. Read the excellent book The Data Warehouse Lifecycle Toolkit by Ralph Kimball and others, so that you have a firm grasp of what is meant by Type 2 SCD.
Step 2. Search this forum for ideas. The topic has been extensively discussed.
Step 3. Enrol in DataStage Best Practices class (DS306), where handling SCDs is discussed.
Step 4. Examine the Intelligent Assistant for SCDs.

In summary, you need to load the dimension tables, generating surrogate keys as appropriate. You can use a sequence (or serial data type) in the database, or generate them within DataStage. The dimension tables should also preserve the natural, or business, key.
Once the dimension tables have been loaded, you need to provide a table where the natural (or business) key can be mapped to its current surrogate key value. In DataStage we normally use a hashed file for this, because it's fast. Perform the mapping as you're loading the fact table(s).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The implementation of a type 2 has data modeling components as well as ETL components. You must know how your data modeling for your type 2 is done so that you can write the appropriate ETL logic. The table typically contains a surrogate key, a natural key construct column, a starting effective date, an optional ending effective date, an optional current row indicator, and an optional version number.

After that, it's usually a matter of design an ETL flow that takes the source row, transforms it, compares it against the current row, and if something is materially different assigns a new surrogate key as it creates a new row. It optionally marks the current row as non-current as it updates the ending effective date. An option is a type 1/2 combination, where if something non-material is different you just update the current row (type 1), but if something material is different you create the newer row (type 2). Lastly, you may opt for an insert only design which would never mark the retiring current row with an ending effective date, so therefore you eliminate some of the optional columns.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ragu reddy
Participant
Posts: 8
Joined: Wed Dec 24, 2003 2:07 pm

Post by ragu reddy »

Thanks.But what exactly is the following .Is it a Ascential learning course.
Enrol in DataStage Best Practices class (DS306), where handling SCDs is discussed.
Examine the Intelligent Assistant for SCDs.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

DS306 is the Ascential course designed by Mike Favero, Art Walker, Dale Nichols, and myself. It takes you thru a generic example of building a data warehouse ETL application. As part of that course, you learn and practice the concept of using hash files as a work area or sandbox to help manipulate the data. Since the most common tasks that people have issues with are:

1. Implementing surrogate keys
2. Implementing changed data determination
3. Implementing a slowly changing dimension (which #1 and #2 are critical elements of)
4. Implementing an ETL solution positioned for using bulk loaders as the primary loading method.
5. Implementing an ETL solution that positions each job to maximize the core capabilities of DS Server jobs (job instantiation, row buffering, hash cache sharing, etc)
6. Implementing a restartable, auditable, scalable, repeatable solution where the ETL application looks like it was planned out from the beginning, as opposed to each job being one-of-a-kind.

With DS 7, there's an Intelligent Assistant for SCD's that's part of of the product. You have to wonder, don't you need to plan for a lot of things as well as just setting up an SCD job????

Anyway, in short, one is a course, the other is a wizard in the tool.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply