Page 1 of 1

How to take care of conformed dimensions

Posted: Thu Aug 30, 2007 10:44 pm
by dsedi
Hi Gurus

I am having a scenario in which two fact tables are sharing two set of conformed dimensions.

One is considered a daily fact, while the other a monthly.
Can this be possible, if the dimensions are conformed.

Correct me if I am wrong. Shouldnt the conformed dimensions be at the same granularity and so should be daily or monthly, not a mix of both.

If no, cant we take care of this by using the refrential integrity between the fact and dimensions, and need not change much.

Or do we need to design something differnt to achieve this.

Thanks
dsedi.

Posted: Fri Aug 31, 2007 8:05 am
by dsedi
Can anyone give their thoughts on this. :idea:

Posted: Fri Aug 31, 2007 9:11 am
by dsedi
Wanted to add that, these dimensions are type 1 and not type 2.

Posted: Fri Aug 31, 2007 9:41 pm
by bkumar103
Did you mean that both the fact table stores same data at the different grain(One at day level and other at month level).

If the above comment is true then dimension should represent the grain of the fact. I think this can be solved in two ways:

Include a two surrogate key in the dimension table. One will repraesent the day level which will change for every day and other for month level(One month will have the same key value) which will change for every month. Use respective key for the respective fact table.

Another method is created a derived dimension(For month level which will have month level attribute and surrogate key) from day level diemension and use the respective dimension to the respective fact table. I prefer the second option.

Posted: Tue Sep 04, 2007 9:02 am
by dsedi
Hi Kumar

Thanks for your inputs. I wanted to know, if frequency also is considered while modeling a Conformed dimension. If that is true, than this dimension should not be conformed.

You mentioned that ou prefer second approach. In the second approach, could you brief me about how and where are you deriving the dimension. Is it in the staging area or in the target region and how. Is it a view :?:

Re: How to take care of conformed dimensions

Posted: Tue Sep 04, 2007 9:28 am
by kcbland
dsedi wrote:

One is considered a daily fact, while the other a monthly.
Can this be possible, if the dimensions are conformed.
Conformed just means that there is only one version of dimension. You are allowed to have difference copies of that same table on different refresh frequencies.
dsedi wrote:
Correct me if I am wrong. Shouldnt the conformed dimensions be at the same granularity and so should be daily or monthly, not a mix of both.
Dimensions don't have a grain, that's a fact characteristic. Dimensions are things like customers, products, people, buildings. Things you can walk up to and kick. The Time dimension is the closest thing to having a series of grains. You could have a Year, Day, Quarter, Period, etc family of tables that make up the Time subject area, but it's still a set of dimension tables.
dsedi wrote:
If no, cant we take care of this by using the refrential integrity between the fact and dimensions, and need not change much.

Or do we need to design something differnt to achieve this.
Fact grains change when you lose dimensions and have to aggregate columns to adjust for the loss. For example, a sales order fact could be a product sold to a customer on a given date. Removing the product from the fact now gives you a different grain (an aggregate) and you summarize the attributes in some fashion, like total order amount for that customer on that date.

You are free to have tables on different refresh cycles depending on the need of the end user, hence the "data mart" concept.

Re: How to take care of conformed dimensions

Posted: Tue Sep 04, 2007 9:40 am
by gateleys
kcbland wrote:Conformed just means that there is only one version of dimension. You are allowed to have difference copies of that same table on different refresh frequencies.
That would mean the OP needs CUST_MONTHLY_DIM and CUST_DAILY_DIM, which are essentially the same version of the CUST_DIM, but with different refresh frequencies.

Posted: Tue Sep 04, 2007 10:31 am
by dsedi
Thanks for your inputs.

So, that implies that for a logical dimension CUST_DIM, I will be using two physical tables created in target having same structures but different data loaded based on different frequencies from the set of source tables.

Do we do it this way, or can we avoid this by having SCD type 2 :idea:

Posted: Tue Sep 04, 2007 11:00 am
by kcbland
SCD Type 2 is a totally different discussion. You deliver data to users at the refresh frequency they desire. It's perfectly understandable that finance folks may want their copies of tables refreshed only after end of month, but marketing and sales folks may want daily refreshes in order to anticipate trends and such.

SCD Type 2 is a means of tracking changes in the dimension information and allocating ranges of time where each version was effective. Refreshing daily or weekly won't cause a fact to reference a different version of a dimension row if there aren't any versions other than one.

Ex: A product dimension row may change 4 times a year, so the fact table would have a foreign key pointing to the product dimension version effective when the fact occurred. It's the nature of an SCD to have all of its versions of rows in use by facts, as the newer facts point to the newer versions in the table and the historical facts are pointing to their appropriate version.

SCD type 2 and refresh frequency is like apples to toaster ovens ( :wink: to Tom Kyte).

Posted: Tue Sep 04, 2007 11:48 am
by dsedi
Kenneth, So, in this case, I should go ahead and create two physical tables using the same conformed dimension structure.

Can create a multi instance job, one for daily and the other for Monthly to populate respective dimension tables?

Am I right?

Re: How to take care of conformed dimensions

Posted: Tue Sep 04, 2007 12:59 pm
by djoni
Dimensions don't have a grain, that's a fact characteristic. Dimensions are things like customers, products, people, buildings. Things you can walk up to and kick. The Time dimension is the closest thing to having a series of grains. You could have a Year, Day, Quarter, Period, etc family of tables that make up the Time subject area, but it's still a set of dimension tables.
Well, if you look at Kimball's definition (see, for example, his book, The Data Warehouse Toolkit, 2nd edition, page 82- 87; more specifically 3rd paragraph of page 83) dimensions do have grain.

Re: How to take care of conformed dimensions

Posted: Tue Sep 04, 2007 1:01 pm
by djoni
dsedi wrote:Hi Gurus

I am having a scenario in which two fact tables are sharing two set of conformed dimensions.

One is considered a daily fact, while the other a monthly.
Can this be possible, if the dimensions are conformed.

Correct me if I am wrong. Shouldnt the conformed dimensions be at the same granularity and so should be daily or monthly, not a mix of both.

If no, cant we take care of this by using the refrential integrity between the fact and dimensions, and need not change much.

Or do we need to design something differnt to achieve this.

Thanks
dsedi.
Take a look at Kimball's book, The Data Warehouse Toolkit, 2nd edition, page 82 - 87.

Posted: Wed Sep 05, 2007 7:43 am
by dsedi
Is creating time variant snapshots or respective tables for conformed Dimensions and Conformed facts the answer to the above. Or is there any other approach to resolve this.

Can anyone also let me know if we need a holding area where we can hold one set of fact and conformed dimensions while the other set which is complete can be published.

Thanks

Posted: Wed Sep 05, 2007 2:06 pm
by kwwilliams
Load frequency of the dimension is not really a concern. As said earlier, dimensions don't have grain, facts have grain. The issue you will have, is determining how the architecture of your data warehouse will account for differences in frequency of loads (data latency).

If you were to load a dimension daily, and you run the fact after the dimension build is complete it is possible for your fact to have a reference to a dimension row that does not exist yet. You can reject the row and reprocess the reject on the next fact run (which would come after the next dimension load, that would then have the row loaded). You could also build your ETL to account for late arriving dimensions. This would then insert missing dimensional rows (key, and surrogate key only) into the dimension. When the dimension is run aftterward it will update the row with the complete information.

You could try to find a way to synch up all of your timings, but as your data warehouse grows this will start to unravel. You can run your dimension loads more frequently, but as you do this you will start thrashing your data warehouse. Every fact and every customer may have a different requirement for fact latency, you are going to end up running your dimensions over and over again trying to keep up with those requirements. You have a simple sounding yet difficult architectural issue that your firm needs to consider how to handle. You should talk to your ETL architect about the issue and see what direction should be pursued. It is really dependent on how your existing processes are architected.