Loading a DWH using Surrogate Key Generator / SCD stage
Posted: Fri Jul 10, 2009 11:15 am
Hello guys,
First I want to say that I have read the documentation, downloaded the red book and the example from IBM. I just need to know a few things that I cannot manage to clearly understand and I wuld just liek to know the general strategy. I use SK everywhere.
I have a DWH to load with dimensions (90% are type I 10% are type II) and 2 fact tables.
So I was thinking this to do it easily :
1) Create a job for each SCD1 dimension to load them in Oracle with the values retrieved from the operational system.
I have inserted a SK gen stage between my AS400 operational source and the Oracle target which create the ID for each business key. I have created the file with a standalone SK gen first.
=> It works great (I understand why there are gaps since I use PX and I don't care) but I want to make sure next time I run the job I can the keep the same IDs. Apparently the SK gen stage doesn't match the SK key with the business key (I think it is normal actually since I cannot link a BK to a SK in the surrogate key generator).
So my question is : to keep the same ID an add the new records I have to use a SCD stage ?
2) In the IBM example the primary source is fact data. I would have assumed in the first place that the primary link contains the current values and that you run them against the old one to detect changes. Why are fact data (sales amounts) are involved ? That's weird to me.
Usually, I would think to match facts with dims only when all dims are loaded and then find out ONLY at the end how to convert BK into SK in the fact table.
So I would like to know your classic DWH loading strategy using surrogate keys for primary keys with SCD ?
Thanks a lot.
First I want to say that I have read the documentation, downloaded the red book and the example from IBM. I just need to know a few things that I cannot manage to clearly understand and I wuld just liek to know the general strategy. I use SK everywhere.
I have a DWH to load with dimensions (90% are type I 10% are type II) and 2 fact tables.
So I was thinking this to do it easily :
1) Create a job for each SCD1 dimension to load them in Oracle with the values retrieved from the operational system.
I have inserted a SK gen stage between my AS400 operational source and the Oracle target which create the ID for each business key. I have created the file with a standalone SK gen first.
=> It works great (I understand why there are gaps since I use PX and I don't care) but I want to make sure next time I run the job I can the keep the same IDs. Apparently the SK gen stage doesn't match the SK key with the business key (I think it is normal actually since I cannot link a BK to a SK in the surrogate key generator).
So my question is : to keep the same ID an add the new records I have to use a SCD stage ?
2) In the IBM example the primary source is fact data. I would have assumed in the first place that the primary link contains the current values and that you run them against the old one to detect changes. Why are fact data (sales amounts) are involved ? That's weird to me.
Usually, I would think to match facts with dims only when all dims are loaded and then find out ONLY at the end how to convert BK into SK in the fact table.
So I would like to know your classic DWH loading strategy using surrogate keys for primary keys with SCD ?
Thanks a lot.