How to take care of conformed dimensions

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsedi
Participant
Posts: 220
Joined: Wed Jun 02, 2004 12:38 am

How to take care of conformed dimensions

Post by dsedi »

Hi Gurus

I am having a scenario in which two fact tables are sharing two set of conformed dimensions.

One is considered a daily fact, while the other a monthly.
Can this be possible, if the dimensions are conformed.

Correct me if I am wrong. Shouldnt the conformed dimensions be at the same granularity and so should be daily or monthly, not a mix of both.

If no, cant we take care of this by using the refrential integrity between the fact and dimensions, and need not change much.

Or do we need to design something differnt to achieve this.

Thanks
dsedi.
dsedi
Participant
Posts: 220
Joined: Wed Jun 02, 2004 12:38 am

Post by dsedi »

Can anyone give their thoughts on this. :idea:
dsedi
Participant
Posts: 220
Joined: Wed Jun 02, 2004 12:38 am

Post by dsedi »

Wanted to add that, these dimensions are type 1 and not type 2.
bkumar103
Participant
Posts: 214
Joined: Wed Jul 25, 2007 2:29 am
Location: Chennai

Post by bkumar103 »

Did you mean that both the fact table stores same data at the different grain(One at day level and other at month level).

If the above comment is true then dimension should represent the grain of the fact. I think this can be solved in two ways:

Include a two surrogate key in the dimension table. One will repraesent the day level which will change for every day and other for month level(One month will have the same key value) which will change for every month. Use respective key for the respective fact table.

Another method is created a derived dimension(For month level which will have month level attribute and surrogate key) from day level diemension and use the respective dimension to the respective fact table. I prefer the second option.
dsedi
Participant
Posts: 220
Joined: Wed Jun 02, 2004 12:38 am

Post by dsedi »

Hi Kumar

Thanks for your inputs. I wanted to know, if frequency also is considered while modeling a Conformed dimension. If that is true, than this dimension should not be conformed.

You mentioned that ou prefer second approach. In the second approach, could you brief me about how and where are you deriving the dimension. Is it in the staging area or in the target region and how. Is it a view :?:
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Re: How to take care of conformed dimensions

Post by kcbland »

dsedi wrote:

One is considered a daily fact, while the other a monthly.
Can this be possible, if the dimensions are conformed.
Conformed just means that there is only one version of dimension. You are allowed to have difference copies of that same table on different refresh frequencies.
dsedi wrote:
Correct me if I am wrong. Shouldnt the conformed dimensions be at the same granularity and so should be daily or monthly, not a mix of both.
Dimensions don't have a grain, that's a fact characteristic. Dimensions are things like customers, products, people, buildings. Things you can walk up to and kick. The Time dimension is the closest thing to having a series of grains. You could have a Year, Day, Quarter, Period, etc family of tables that make up the Time subject area, but it's still a set of dimension tables.
dsedi wrote:
If no, cant we take care of this by using the refrential integrity between the fact and dimensions, and need not change much.

Or do we need to design something differnt to achieve this.
Fact grains change when you lose dimensions and have to aggregate columns to adjust for the loss. For example, a sales order fact could be a product sold to a customer on a given date. Removing the product from the fact now gives you a different grain (an aggregate) and you summarize the attributes in some fashion, like total order amount for that customer on that date.

You are free to have tables on different refresh cycles depending on the need of the end user, hence the "data mart" concept.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Re: How to take care of conformed dimensions

Post by gateleys »

kcbland wrote:Conformed just means that there is only one version of dimension. You are allowed to have difference copies of that same table on different refresh frequencies.
That would mean the OP needs CUST_MONTHLY_DIM and CUST_DAILY_DIM, which are essentially the same version of the CUST_DIM, but with different refresh frequencies.
gateleys
dsedi
Participant
Posts: 220
Joined: Wed Jun 02, 2004 12:38 am

Post by dsedi »

Thanks for your inputs.

So, that implies that for a logical dimension CUST_DIM, I will be using two physical tables created in target having same structures but different data loaded based on different frequencies from the set of source tables.

Do we do it this way, or can we avoid this by having SCD type 2 :idea:
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

SCD Type 2 is a totally different discussion. You deliver data to users at the refresh frequency they desire. It's perfectly understandable that finance folks may want their copies of tables refreshed only after end of month, but marketing and sales folks may want daily refreshes in order to anticipate trends and such.

SCD Type 2 is a means of tracking changes in the dimension information and allocating ranges of time where each version was effective. Refreshing daily or weekly won't cause a fact to reference a different version of a dimension row if there aren't any versions other than one.

Ex: A product dimension row may change 4 times a year, so the fact table would have a foreign key pointing to the product dimension version effective when the fact occurred. It's the nature of an SCD to have all of its versions of rows in use by facts, as the newer facts point to the newer versions in the table and the historical facts are pointing to their appropriate version.

SCD type 2 and refresh frequency is like apples to toaster ovens ( :wink: to Tom Kyte).
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
dsedi
Participant
Posts: 220
Joined: Wed Jun 02, 2004 12:38 am

Post by dsedi »

Kenneth, So, in this case, I should go ahead and create two physical tables using the same conformed dimension structure.

Can create a multi instance job, one for daily and the other for Monthly to populate respective dimension tables?

Am I right?
djoni
Participant
Posts: 98
Joined: Wed Oct 05, 2005 1:01 pm

Re: How to take care of conformed dimensions

Post by djoni »

Dimensions don't have a grain, that's a fact characteristic. Dimensions are things like customers, products, people, buildings. Things you can walk up to and kick. The Time dimension is the closest thing to having a series of grains. You could have a Year, Day, Quarter, Period, etc family of tables that make up the Time subject area, but it's still a set of dimension tables.
Well, if you look at Kimball's definition (see, for example, his book, The Data Warehouse Toolkit, 2nd edition, page 82- 87; more specifically 3rd paragraph of page 83) dimensions do have grain.
Last edited by djoni on Tue Sep 04, 2007 1:05 pm, edited 2 times in total.
djoni
Participant
Posts: 98
Joined: Wed Oct 05, 2005 1:01 pm

Re: How to take care of conformed dimensions

Post by djoni »

dsedi wrote:Hi Gurus

I am having a scenario in which two fact tables are sharing two set of conformed dimensions.

One is considered a daily fact, while the other a monthly.
Can this be possible, if the dimensions are conformed.

Correct me if I am wrong. Shouldnt the conformed dimensions be at the same granularity and so should be daily or monthly, not a mix of both.

If no, cant we take care of this by using the refrential integrity between the fact and dimensions, and need not change much.

Or do we need to design something differnt to achieve this.

Thanks
dsedi.
Take a look at Kimball's book, The Data Warehouse Toolkit, 2nd edition, page 82 - 87.
dsedi
Participant
Posts: 220
Joined: Wed Jun 02, 2004 12:38 am

Post by dsedi »

Is creating time variant snapshots or respective tables for conformed Dimensions and Conformed facts the answer to the above. Or is there any other approach to resolve this.

Can anyone also let me know if we need a holding area where we can hold one set of fact and conformed dimensions while the other set which is complete can be published.

Thanks
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Load frequency of the dimension is not really a concern. As said earlier, dimensions don't have grain, facts have grain. The issue you will have, is determining how the architecture of your data warehouse will account for differences in frequency of loads (data latency).

If you were to load a dimension daily, and you run the fact after the dimension build is complete it is possible for your fact to have a reference to a dimension row that does not exist yet. You can reject the row and reprocess the reject on the next fact run (which would come after the next dimension load, that would then have the row loaded). You could also build your ETL to account for late arriving dimensions. This would then insert missing dimensional rows (key, and surrogate key only) into the dimension. When the dimension is run aftterward it will update the row with the complete information.

You could try to find a way to synch up all of your timings, but as your data warehouse grows this will start to unravel. You can run your dimension loads more frequently, but as you do this you will start thrashing your data warehouse. Every fact and every customer may have a different requirement for fact latency, you are going to end up running your dimensions over and over again trying to keep up with those requirements. You have a simple sounding yet difficult architectural issue that your firm needs to consider how to handle. You should talk to your ETL architect about the issue and see what direction should be pursued. It is really dependent on how your existing processes are architected.
Post Reply