Page 1 of 1

How to implement typeII slowly changing dimensions in star

Posted: Wed Apr 07, 2004 7:19 pm
by narsingrp
Hi Guys,

Can anybody give me any idea on how to implement slowly changing dimensions type.II in star schema(dimensional modeling)

Thanks in advance.

Posted: Wed Apr 07, 2004 8:06 pm
by ray.wurlod
Step 1. Read the excellent book The Data Warehouse Lifecycle Toolkit by Ralph Kimball and others, so that you have a firm grasp of what is meant by Type 2 SCD.
Step 2. Search this forum for ideas. The topic has been extensively discussed.
Step 3. Enrol in DataStage Best Practices class (DS306), where handling SCDs is discussed.
Step 4. Examine the Intelligent Assistant for SCDs.

In summary, you need to load the dimension tables, generating surrogate keys as appropriate. You can use a sequence (or serial data type) in the database, or generate them within DataStage. The dimension tables should also preserve the natural, or business, key.
Once the dimension tables have been loaded, you need to provide a table where the natural (or business) key can be mapped to its current surrogate key value. In DataStage we normally use a hashed file for this, because it's fast. Perform the mapping as you're loading the fact table(s).

Posted: Wed Apr 07, 2004 8:36 pm
by kcbland
The implementation of a type 2 has data modeling components as well as ETL components. You must know how your data modeling for your type 2 is done so that you can write the appropriate ETL logic. The table typically contains a surrogate key, a natural key construct column, a starting effective date, an optional ending effective date, an optional current row indicator, and an optional version number.

After that, it's usually a matter of design an ETL flow that takes the source row, transforms it, compares it against the current row, and if something is materially different assigns a new surrogate key as it creates a new row. It optionally marks the current row as non-current as it updates the ending effective date. An option is a type 1/2 combination, where if something non-material is different you just update the current row (type 1), but if something material is different you create the newer row (type 2). Lastly, you may opt for an insert only design which would never mark the retiring current row with an ending effective date, so therefore you eliminate some of the optional columns.

Posted: Fri Apr 09, 2004 6:44 pm
by ragu reddy
Thanks.But what exactly is the following .Is it a Ascential learning course.
Enrol in DataStage Best Practices class (DS306), where handling SCDs is discussed.
Examine the Intelligent Assistant for SCDs.

Posted: Fri Apr 09, 2004 8:14 pm
by kcbland
DS306 is the Ascential course designed by Mike Favero, Art Walker, Dale Nichols, and myself. It takes you thru a generic example of building a data warehouse ETL application. As part of that course, you learn and practice the concept of using hash files as a work area or sandbox to help manipulate the data. Since the most common tasks that people have issues with are:

1. Implementing surrogate keys
2. Implementing changed data determination
3. Implementing a slowly changing dimension (which #1 and #2 are critical elements of)
4. Implementing an ETL solution positioned for using bulk loaders as the primary loading method.
5. Implementing an ETL solution that positions each job to maximize the core capabilities of DS Server jobs (job instantiation, row buffering, hash cache sharing, etc)
6. Implementing a restartable, auditable, scalable, repeatable solution where the ETL application looks like it was planned out from the beginning, as opposed to each job being one-of-a-kind.

With DS 7, there's an Intelligent Assistant for SCD's that's part of of the product. You have to wonder, don't you need to plan for a lot of things as well as just setting up an SCD job????

Anyway, in short, one is a course, the other is a wizard in the tool.