Hi ,
can you think of some disadvantages of using SCD`s in real time scenario..
dis-advantages of SCD`s
Moderators: chulett, rschirm, roy
dis-advantages of SCD`s
pavan
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Slowly changing dimensions are a mechanism for dealing with (dimension) data that may change over time.
Advantages/disadvantages just doesn't come into it!
If you need to preserve history in your DW - and I can assure you that you do, in some shape or form - then you will need to implement slowly changing dimension methodology.
That is, if your business requirements dictate, you must implement. And you will find that your business requirements do so dictate. For example, you (they) will want to keep track of your customers' name changes.
Advantages/disadvantages just doesn't come into it!
If you need to preserve history in your DW - and I can assure you that you do, in some shape or form - then you will need to implement slowly changing dimension methodology.
That is, if your business requirements dictate, you must implement. And you will find that your business requirements do so dictate. For example, you (they) will want to keep track of your customers' name changes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi,
As Ray said SCD is a need not a storage method or partitioning methodology which are pure performance factors in your system.
The Use of SCD effects on performance are subject to increase of the data volume by having several entities with different time frames in the same table and the additional columns to distinct each entity instance in time from one another.
Though having to deal with SCDs does effect your programing or reports logic you can tune them in one way or another to satisfy your needs performance wise.
IHTH,
As Ray said SCD is a need not a storage method or partitioning methodology which are pure performance factors in your system.
The Use of SCD effects on performance are subject to increase of the data volume by having several entities with different time frames in the same table and the additional columns to distinct each entity instance in time from one another.
Though having to deal with SCDs does effect your programing or reports logic you can tune them in one way or another to satisfy your needs performance wise.
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Taking an SCD type 2 table as an example there are three additional requirements over a normal table that take up resources. They store much more information, which impacts on (cheap) disk space, index size and performance. Every update to this table is two transactions, an insert of the new current row and an update to the old current row. Lookups against this table potentialy use date between clauses which are a lot slower then straight joins.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
In a real time scenario, an SCD can become a critical timing issue.
Imagine you have a queue of database transactions waiting to be processed. This queue contains rows of data that feed both dimension and fact tables. If the rows are time sequenced according to their updated time, then you can feel confident that you can read the data sequentially off the queue. If a queue message corresponds to a dimension, then process and commit it. If it corresponds to a fact, then also process and commit it.
However, if your dimensional data is coming in on a separate queue or on a the same queue but not time sequenced, then it is imperative that you take a serious look at your queue architecture. For example, if your dimension queue is fed from a customer management system and your facts are coming from a sales order processing system, you will want to make sure that your latest dimensional information is on hand so that when a fact comes thru, the appropriate surrogate key can be retrived when during a foreign key for foriegn surrogate key substitution.
You may consider "draining" the dimension queue completely, storing the last timestamp from the last row processed. Then, alternate over to the fact queue and "drain" it until you reach a fact that exceeds the last dimension timestamp. Now you will have to switch back over to the queue for the dimensions and again "drain" it until out of dimensions.
In the real world, you won't have a single queue getting all data from all systems, sorting it, and feeding it to you nicely. You will have to deal with collating multiple queues from the same source system because of queue management techniques to deal with processing spikes during the day on different tables.
There is not really a point of arguing against using SCD's. They serve specific purposes that solve specific data and time relevancy issues. Our tasks include implementing them in a way that allows their benefits to be realized. That's why there's type 1, type 2, type 3, the blend of type 1 and 2, etc.
Imagine you have a queue of database transactions waiting to be processed. This queue contains rows of data that feed both dimension and fact tables. If the rows are time sequenced according to their updated time, then you can feel confident that you can read the data sequentially off the queue. If a queue message corresponds to a dimension, then process and commit it. If it corresponds to a fact, then also process and commit it.
However, if your dimensional data is coming in on a separate queue or on a the same queue but not time sequenced, then it is imperative that you take a serious look at your queue architecture. For example, if your dimension queue is fed from a customer management system and your facts are coming from a sales order processing system, you will want to make sure that your latest dimensional information is on hand so that when a fact comes thru, the appropriate surrogate key can be retrived when during a foreign key for foriegn surrogate key substitution.
You may consider "draining" the dimension queue completely, storing the last timestamp from the last row processed. Then, alternate over to the fact queue and "drain" it until you reach a fact that exceeds the last dimension timestamp. Now you will have to switch back over to the queue for the dimensions and again "drain" it until out of dimensions.
In the real world, you won't have a single queue getting all data from all systems, sorting it, and feeding it to you nicely. You will have to deal with collating multiple queues from the same source system because of queue management techniques to deal with processing spikes during the day on different tables.
There is not really a point of arguing against using SCD's. They serve specific purposes that solve specific data and time relevancy issues. Our tasks include implementing them in a way that allows their benefits to be realized. That's why there's type 1, type 2, type 3, the blend of type 1 and 2, etc.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle