dis-advantages of SCD`s

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
pavanns
Participant
Posts: 27
Joined: Wed Sep 28, 2005 8:00 pm
Location: ca

dis-advantages of SCD`s

Post by pavanns »

Hi ,
can you think of some disadvantages of using SCD`s in real time scenario..
pavan
pavanns
Participant
Posts: 27
Joined: Wed Sep 28, 2005 8:00 pm
Location: ca

Post by pavanns »

on windows
pavan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Slowly changing dimensions are a mechanism for dealing with (dimension) data that may change over time.

Advantages/disadvantages just doesn't come into it!

If you need to preserve history in your DW - and I can assure you that you do, in some shape or form - then you will need to implement slowly changing dimension methodology.

That is, if your business requirements dictate, you must implement. And you will find that your business requirements do so dictate. For example, you (they) will want to keep track of your customers' name changes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
As Ray said SCD is a need not a storage method or partitioning methodology which are pure performance factors in your system.

The Use of SCD effects on performance are subject to increase of the data volume by having several entities with different time frames in the same table and the additional columns to distinct each entity instance in time from one another.

Though having to deal with SCDs does effect your programing or reports logic you can tune them in one way or another to satisfy your needs performance wise.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Taking an SCD type 2 table as an example there are three additional requirements over a normal table that take up resources. They store much more information, which impacts on (cheap) disk space, index size and performance. Every update to this table is two transactions, an insert of the new current row and an update to the old current row. Lookups against this table potentialy use date between clauses which are a lot slower then straight joins.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

In a real time scenario, an SCD can become a critical timing issue.

Imagine you have a queue of database transactions waiting to be processed. This queue contains rows of data that feed both dimension and fact tables. If the rows are time sequenced according to their updated time, then you can feel confident that you can read the data sequentially off the queue. If a queue message corresponds to a dimension, then process and commit it. If it corresponds to a fact, then also process and commit it.

However, if your dimensional data is coming in on a separate queue or on a the same queue but not time sequenced, then it is imperative that you take a serious look at your queue architecture. For example, if your dimension queue is fed from a customer management system and your facts are coming from a sales order processing system, you will want to make sure that your latest dimensional information is on hand so that when a fact comes thru, the appropriate surrogate key can be retrived when during a foreign key for foriegn surrogate key substitution.

You may consider "draining" the dimension queue completely, storing the last timestamp from the last row processed. Then, alternate over to the fact queue and "drain" it until you reach a fact that exceeds the last dimension timestamp. Now you will have to switch back over to the queue for the dimensions and again "drain" it until out of dimensions.

In the real world, you won't have a single queue getting all data from all systems, sorting it, and feeding it to you nicely. You will have to deal with collating multiple queues from the same source system because of queue management techniques to deal with processing spikes during the day on different tables.

There is not really a point of arguing against using SCD's. They serve specific purposes that solve specific data and time relevancy issues. Our tasks include implementing them in a way that allows their benefits to be realized. That's why there's type 1, type 2, type 3, the blend of type 1 and 2, etc.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply