Hello!
Some time ago there was a discussion about multiple SCD types in one dimension. Original post is here:
viewtopic.php?t=93674&highlight=scd
The main conclusion was that Type 1 or 2 is for whole dimension, not for single field in dimension.
Recently I have read Kimbal's book DW ETL Toolkit and there is another method called Hybrid SCD. Ralph says that SCD type at field level is possible. You may have 3 kinds of SCDs in one dimension (i.e. column1 is type 1, column2 is type3 and column 3 and 4 is type2). When you process dimension you first manage types 2 and 3 and then make update on all fields with correct natural (source) keys.
How do you comment this issue? Do you think this is good idea? Is there some limitations of this solution (for example performance)?
Thanks for any comments!
SCD type at column level
Moderators: chulett, rschirm, roy
Hybrid SCDs are common - sometimes there's no need to track history for some fields - f.e. consumer name or sex
Actually it's the business people who decide what do they want to track - so they decide if it's a good idea or not.
As in the link You gave, there is one approach that may help You with Hybrid SCDs - simply divide the dimension into two - one will be Type 1 and the second Type 2 - see Ralph's "Data Warehouse Lifecycle Toolkit" page 184![Smile :)](./images/smilies/icon_smile.gif)
![Wink :wink:](./images/smilies/icon_wink.gif)
As in the link You gave, there is one approach that may help You with Hybrid SCDs - simply divide the dimension into two - one will be Type 1 and the second Type 2 - see Ralph's "Data Warehouse Lifecycle Toolkit" page 184
![Smile :)](./images/smilies/icon_smile.gif)
Regards,
Wojciech Nogalski
Wojciech Nogalski
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I still think this is ugly. I think on a certain level they want to know what is an elogant solution and not ugly. I think the trade off is mutiple records versus a single record. It is easier to compare previous values if it is stored all in one record. How often is this a requirement in your reporting environment that you know the previous value? This may determine SCD Hybrid or type 2 or 3.
I think type 2 is easy to follow and implement. I think it garuntees reporting accuracy at the dimension level. I do not care as to how rapid this dimension is changing. There is a cost as to the loading time of facts. You need to find the correct dimension key to insert into the fact based on the date of the fact.
Lets say customer 111 has several facts. We want to report credit card trasnactions or facts based on the customer zip code. Therefore we decide to use SCD type 2 dimensions. So maybe the first time we load customer 111 it gets surrogate key 2222 with zip code 12345 in Maryland. Next time it changes on 1/15/2004 to zip code 77777 in Alaska with surrogate key 3211. We need to know the date of the transaction to pick either 3211 or 2222 for this customer.
It appears to me that these other SCDs can tell you how many people moved from Maryland to Alaska within one record but what other advantage does this have. Now within that one record we move fields around from ZIP1 to ZIP2.
I think the question was more about this type of impression. I would like to hear others opinions about advatages and disadvantages.
I think type 2 is easy to follow and implement. I think it garuntees reporting accuracy at the dimension level. I do not care as to how rapid this dimension is changing. There is a cost as to the loading time of facts. You need to find the correct dimension key to insert into the fact based on the date of the fact.
Lets say customer 111 has several facts. We want to report credit card trasnactions or facts based on the customer zip code. Therefore we decide to use SCD type 2 dimensions. So maybe the first time we load customer 111 it gets surrogate key 2222 with zip code 12345 in Maryland. Next time it changes on 1/15/2004 to zip code 77777 in Alaska with surrogate key 3211. We need to know the date of the transaction to pick either 3211 or 2222 for this customer.
It appears to me that these other SCDs can tell you how many people moved from Maryland to Alaska within one record but what other advantage does this have. Now within that one record we move fields around from ZIP1 to ZIP2.
I think the question was more about this type of impression. I would like to hear others opinions about advatages and disadvantages.
Mamu Kim
I think going the hybrid route is a bad idea and I'll give you an example. Let's say you have a table named DIM_STORE. You decide that some columns in the table are type 1 and some are type 2. To avoid anomolies, each time any of the type 1 attributes change you must go back and update all rows based on the natural key.
Example
In the above example where a type 1 column changed in the third record you must go back and update all of the records with that natural key.
This adds a level of complexity that is not necessary. You may choose to not update the other two rows, but then reports would be wrong since the real value for type1_col should be ZZZ, but will be reported as ABC.
I know that Ralph and others write about doing hybrids, but I don't think it's a really solid solution.
Just my opinion.
Example
Code: Select all
natural key surrogate key type1_col type2_col
1 1 ABC DEF Expired
1 2 ABC GHI Expired
1 3 ZZZ XXX Active
Code: Select all
natural key surrogate key type1_col type2_col
1 1 ZZZ DEF Expired
1 2 ZZZ GHI Expired
1 3 ZZZ XXX Active
I know that Ralph and others write about doing hybrids, but I don't think it's a really solid solution.
Just my opinion.
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com