SCD type at column level

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Luk
Participant
Posts: 133
Joined: Thu Dec 02, 2004 8:35 am
Location: Poland
Contact:

SCD type at column level

Post by Luk »

Hello!

Some time ago there was a discussion about multiple SCD types in one dimension. Original post is here:
viewtopic.php?t=93674&highlight=scd

The main conclusion was that Type 1 or 2 is for whole dimension, not for single field in dimension.
Recently I have read Kimbal's book DW ETL Toolkit and there is another method called Hybrid SCD. Ralph says that SCD type at field level is possible. You may have 3 kinds of SCDs in one dimension (i.e. column1 is type 1, column2 is type3 and column 3 and 4 is type2). When you process dimension you first manage types 2 and 3 and then make update on all fields with correct natural (source) keys.

How do you comment this issue? Do you think this is good idea? Is there some limitations of this solution (for example performance)?

Thanks for any comments!
LUK
wnogalski
Charter Member
Charter Member
Posts: 54
Joined: Thu Jan 06, 2005 10:49 am
Location: Warsaw

Post by wnogalski »

Hybrid SCDs are common - sometimes there's no need to track history for some fields - f.e. consumer name or sex :wink: Actually it's the business people who decide what do they want to track - so they decide if it's a good idea or not.
As in the link You gave, there is one approach that may help You with Hybrid SCDs - simply divide the dimension into two - one will be Type 1 and the second Type 2 - see Ralph's "Data Warehouse Lifecycle Toolkit" page 184 :)
Regards,
Wojciech Nogalski
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You might also like to investigate the concept of rapidly changing dimensions, so that you have the complete picture.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I still think this is ugly. I think on a certain level they want to know what is an elogant solution and not ugly. I think the trade off is mutiple records versus a single record. It is easier to compare previous values if it is stored all in one record. How often is this a requirement in your reporting environment that you know the previous value? This may determine SCD Hybrid or type 2 or 3.

I think type 2 is easy to follow and implement. I think it garuntees reporting accuracy at the dimension level. I do not care as to how rapid this dimension is changing. There is a cost as to the loading time of facts. You need to find the correct dimension key to insert into the fact based on the date of the fact.

Lets say customer 111 has several facts. We want to report credit card trasnactions or facts based on the customer zip code. Therefore we decide to use SCD type 2 dimensions. So maybe the first time we load customer 111 it gets surrogate key 2222 with zip code 12345 in Maryland. Next time it changes on 1/15/2004 to zip code 77777 in Alaska with surrogate key 3211. We need to know the date of the transaction to pick either 3211 or 2222 for this customer.

It appears to me that these other SCDs can tell you how many people moved from Maryland to Alaska within one record but what other advantage does this have. Now within that one record we move fields around from ZIP1 to ZIP2.

I think the question was more about this type of impression. I would like to hear others opinions about advatages and disadvantages.
Mamu Kim
Luk
Participant
Posts: 133
Joined: Thu Dec 02, 2004 8:35 am
Location: Poland
Contact:

Post by Luk »

I think the question was more about this type of impression. I would like to hear others opinions about advatages and disadvantages.
correct :)
LUK
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

I think going the hybrid route is a bad idea and I'll give you an example. Let's say you have a table named DIM_STORE. You decide that some columns in the table are type 1 and some are type 2. To avoid anomolies, each time any of the type 1 attributes change you must go back and update all rows based on the natural key.

Example

Code: Select all


natural key  surrogate key          type1_col   type2_col

1                 1                   ABC        DEF                   Expired
1                 2                   ABC        GHI                   Expired
1                 3                   ZZZ        XXX                   Active

In the above example where a type 1 column changed in the third record you must go back and update all of the records with that natural key.

Code: Select all


natural key  surrogate key          type1_col type2_col

1                 1                   ZZZ        DEF                   Expired
1                 2                   ZZZ        GHI                   Expired
1                 3                   ZZZ        XXX                   Active

This adds a level of complexity that is not necessary. You may choose to not update the other two rows, but then reports would be wrong since the real value for type1_col should be ZZZ, but will be reported as ABC.

I know that Ralph and others write about doing hybrids, but I don't think it's a really solid solution.

Just my opinion.
Post Reply