How to take care of conformed dimensions
Moderators: chulett, rschirm, roy
How to take care of conformed dimensions
Hi Gurus
I am having a scenario in which two fact tables are sharing two set of conformed dimensions.
One is considered a daily fact, while the other a monthly.
Can this be possible, if the dimensions are conformed.
Correct me if I am wrong. Shouldnt the conformed dimensions be at the same granularity and so should be daily or monthly, not a mix of both.
If no, cant we take care of this by using the refrential integrity between the fact and dimensions, and need not change much.
Or do we need to design something differnt to achieve this.
Thanks
dsedi.
I am having a scenario in which two fact tables are sharing two set of conformed dimensions.
One is considered a daily fact, while the other a monthly.
Can this be possible, if the dimensions are conformed.
Correct me if I am wrong. Shouldnt the conformed dimensions be at the same granularity and so should be daily or monthly, not a mix of both.
If no, cant we take care of this by using the refrential integrity between the fact and dimensions, and need not change much.
Or do we need to design something differnt to achieve this.
Thanks
dsedi.
Did you mean that both the fact table stores same data at the different grain(One at day level and other at month level).
If the above comment is true then dimension should represent the grain of the fact. I think this can be solved in two ways:
Include a two surrogate key in the dimension table. One will repraesent the day level which will change for every day and other for month level(One month will have the same key value) which will change for every month. Use respective key for the respective fact table.
Another method is created a derived dimension(For month level which will have month level attribute and surrogate key) from day level diemension and use the respective dimension to the respective fact table. I prefer the second option.
If the above comment is true then dimension should represent the grain of the fact. I think this can be solved in two ways:
Include a two surrogate key in the dimension table. One will repraesent the day level which will change for every day and other for month level(One month will have the same key value) which will change for every month. Use respective key for the respective fact table.
Another method is created a derived dimension(For month level which will have month level attribute and surrogate key) from day level diemension and use the respective dimension to the respective fact table. I prefer the second option.
Hi Kumar
Thanks for your inputs. I wanted to know, if frequency also is considered while modeling a Conformed dimension. If that is true, than this dimension should not be conformed.
You mentioned that ou prefer second approach. In the second approach, could you brief me about how and where are you deriving the dimension. Is it in the staging area or in the target region and how. Is it a view
Thanks for your inputs. I wanted to know, if frequency also is considered while modeling a Conformed dimension. If that is true, than this dimension should not be conformed.
You mentioned that ou prefer second approach. In the second approach, could you brief me about how and where are you deriving the dimension. Is it in the staging area or in the target region and how. Is it a view
Re: How to take care of conformed dimensions
Conformed just means that there is only one version of dimension. You are allowed to have difference copies of that same table on different refresh frequencies.dsedi wrote:
One is considered a daily fact, while the other a monthly.
Can this be possible, if the dimensions are conformed.
Dimensions don't have a grain, that's a fact characteristic. Dimensions are things like customers, products, people, buildings. Things you can walk up to and kick. The Time dimension is the closest thing to having a series of grains. You could have a Year, Day, Quarter, Period, etc family of tables that make up the Time subject area, but it's still a set of dimension tables.dsedi wrote:
Correct me if I am wrong. Shouldnt the conformed dimensions be at the same granularity and so should be daily or monthly, not a mix of both.
Fact grains change when you lose dimensions and have to aggregate columns to adjust for the loss. For example, a sales order fact could be a product sold to a customer on a given date. Removing the product from the fact now gives you a different grain (an aggregate) and you summarize the attributes in some fashion, like total order amount for that customer on that date.dsedi wrote:
If no, cant we take care of this by using the refrential integrity between the fact and dimensions, and need not change much.
Or do we need to design something differnt to achieve this.
You are free to have tables on different refresh cycles depending on the need of the end user, hence the "data mart" concept.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Re: How to take care of conformed dimensions
That would mean the OP needs CUST_MONTHLY_DIM and CUST_DAILY_DIM, which are essentially the same version of the CUST_DIM, but with different refresh frequencies.kcbland wrote:Conformed just means that there is only one version of dimension. You are allowed to have difference copies of that same table on different refresh frequencies.
gateleys
Thanks for your inputs.
So, that implies that for a logical dimension CUST_DIM, I will be using two physical tables created in target having same structures but different data loaded based on different frequencies from the set of source tables.
Do we do it this way, or can we avoid this by having SCD type 2
So, that implies that for a logical dimension CUST_DIM, I will be using two physical tables created in target having same structures but different data loaded based on different frequencies from the set of source tables.
Do we do it this way, or can we avoid this by having SCD type 2
SCD Type 2 is a totally different discussion. You deliver data to users at the refresh frequency they desire. It's perfectly understandable that finance folks may want their copies of tables refreshed only after end of month, but marketing and sales folks may want daily refreshes in order to anticipate trends and such.
SCD Type 2 is a means of tracking changes in the dimension information and allocating ranges of time where each version was effective. Refreshing daily or weekly won't cause a fact to reference a different version of a dimension row if there aren't any versions other than one.
Ex: A product dimension row may change 4 times a year, so the fact table would have a foreign key pointing to the product dimension version effective when the fact occurred. It's the nature of an SCD to have all of its versions of rows in use by facts, as the newer facts point to the newer versions in the table and the historical facts are pointing to their appropriate version.
SCD type 2 and refresh frequency is like apples to toaster ovens ( to Tom Kyte).
SCD Type 2 is a means of tracking changes in the dimension information and allocating ranges of time where each version was effective. Refreshing daily or weekly won't cause a fact to reference a different version of a dimension row if there aren't any versions other than one.
Ex: A product dimension row may change 4 times a year, so the fact table would have a foreign key pointing to the product dimension version effective when the fact occurred. It's the nature of an SCD to have all of its versions of rows in use by facts, as the newer facts point to the newer versions in the table and the historical facts are pointing to their appropriate version.
SCD type 2 and refresh frequency is like apples to toaster ovens ( to Tom Kyte).
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Re: How to take care of conformed dimensions
Well, if you look at Kimball's definition (see, for example, his book, The Data Warehouse Toolkit, 2nd edition, page 82- 87; more specifically 3rd paragraph of page 83) dimensions do have grain.Dimensions don't have a grain, that's a fact characteristic. Dimensions are things like customers, products, people, buildings. Things you can walk up to and kick. The Time dimension is the closest thing to having a series of grains. You could have a Year, Day, Quarter, Period, etc family of tables that make up the Time subject area, but it's still a set of dimension tables.
Last edited by djoni on Tue Sep 04, 2007 1:05 pm, edited 2 times in total.
Re: How to take care of conformed dimensions
Take a look at Kimball's book, The Data Warehouse Toolkit, 2nd edition, page 82 - 87.dsedi wrote:Hi Gurus
I am having a scenario in which two fact tables are sharing two set of conformed dimensions.
One is considered a daily fact, while the other a monthly.
Can this be possible, if the dimensions are conformed.
Correct me if I am wrong. Shouldnt the conformed dimensions be at the same granularity and so should be daily or monthly, not a mix of both.
If no, cant we take care of this by using the refrential integrity between the fact and dimensions, and need not change much.
Or do we need to design something differnt to achieve this.
Thanks
dsedi.
Is creating time variant snapshots or respective tables for conformed Dimensions and Conformed facts the answer to the above. Or is there any other approach to resolve this.
Can anyone also let me know if we need a holding area where we can hold one set of fact and conformed dimensions while the other set which is complete can be published.
Thanks
Can anyone also let me know if we need a holding area where we can hold one set of fact and conformed dimensions while the other set which is complete can be published.
Thanks
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Load frequency of the dimension is not really a concern. As said earlier, dimensions don't have grain, facts have grain. The issue you will have, is determining how the architecture of your data warehouse will account for differences in frequency of loads (data latency).
If you were to load a dimension daily, and you run the fact after the dimension build is complete it is possible for your fact to have a reference to a dimension row that does not exist yet. You can reject the row and reprocess the reject on the next fact run (which would come after the next dimension load, that would then have the row loaded). You could also build your ETL to account for late arriving dimensions. This would then insert missing dimensional rows (key, and surrogate key only) into the dimension. When the dimension is run aftterward it will update the row with the complete information.
You could try to find a way to synch up all of your timings, but as your data warehouse grows this will start to unravel. You can run your dimension loads more frequently, but as you do this you will start thrashing your data warehouse. Every fact and every customer may have a different requirement for fact latency, you are going to end up running your dimensions over and over again trying to keep up with those requirements. You have a simple sounding yet difficult architectural issue that your firm needs to consider how to handle. You should talk to your ETL architect about the issue and see what direction should be pursued. It is really dependent on how your existing processes are architected.
If you were to load a dimension daily, and you run the fact after the dimension build is complete it is possible for your fact to have a reference to a dimension row that does not exist yet. You can reject the row and reprocess the reject on the next fact run (which would come after the next dimension load, that would then have the row loaded). You could also build your ETL to account for late arriving dimensions. This would then insert missing dimensional rows (key, and surrogate key only) into the dimension. When the dimension is run aftterward it will update the row with the complete information.
You could try to find a way to synch up all of your timings, but as your data warehouse grows this will start to unravel. You can run your dimension loads more frequently, but as you do this you will start thrashing your data warehouse. Every fact and every customer may have a different requirement for fact latency, you are going to end up running your dimensions over and over again trying to keep up with those requirements. You have a simple sounding yet difficult architectural issue that your firm needs to consider how to handle. You should talk to your ETL architect about the issue and see what direction should be pursued. It is really dependent on how your existing processes are architected.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com