Page 1 of 1

SCD Clarification

Posted: Mon Apr 30, 2012 4:19 am
by vamsi.4a6
Hi,

When I google it,i got the algorithm of SCD stage as below.


for each new record
get business key
check if business key exists in dimension
if so then
check if record has changed
if so then
if (type 1 change) then update dimension record
if (type 2 change) then expire dimension record and add new dimension record
end if
else
add new dimension record
end if
next record

I have few doubts regarding the above algorithm


1) As per my understanding i modified the above algorithm as below. Could anyone please check it out whether my understanding is correct or not?



for each new record in Primary link
get business key from Primary link
check if business key exists in dimension I.e reference link
if so then
check if record has changed in in dimension I.e reference link
if so then
if (type 1 change) then update dimension record I.e reference link
if (type 2 change) then expire dimension record I.e reference link and add new dimension record I.e reference link
end if
else
add new dimension record I.e reference link
end if
next record from Primary link


2) We did not use any surrogate key in the above algorithm anywhere and I know what is meant by surrogate key but I did not understand how it will impact our algorithm and what is the need for defining it in step2 of SCD Stage?

Could any one please clarify on the above two points?

Posted: Mon Apr 30, 2012 9:14 am
by jerome_rajan
What if you business key is a composite key containing 10 columns? Then you would have to update records based on 10 keys. The FACT will also have these 10 columns and assuming that you have 10 dimensions each with an average of 5 key columns, your FACT would end up with 50 keys! Add the measures to that and you have a table that would be a developer's nightmare.

It all depends on your data model is set up. The surrogate key is there to help improve performance, maintainability and help developers formulate more efficient queries in lesser time.

The truth remains that your model can live without the surrogate key but try asking it a question and it's going to get itself all tangled up.

Posted: Mon Apr 30, 2012 10:06 am
by vamsi.4a6
@jerome_rajan

I got ur point and Could any one please clarify on the first point?

Posted: Mon Apr 30, 2012 10:11 am
by jerome_rajan
What exactly are you trying to say with the following?

add new dimension record I.e reference link

Re: SCD Clarification

Posted: Mon Apr 30, 2012 10:14 am
by chulett
vamsi.4a6 wrote:2) We did not use any surrogate key in the above algorithm anywhere
Clarify this statement for us, please. Do you mean you just didn't include discussion of the role of a surrogate key in your talk about the 'algorithm' involved or are you saying that you did not leverage a surrogate key in your actual dimension / fact modeling? Jerome is assuming the latter hence the concern about '50 keys' but I would find that lack (if it existed) rather... unusual.

Posted: Mon Apr 30, 2012 10:17 am
by vamsi.4a6
jerome_rajan wrote:What exactly are you trying to say with the following?

add new dimension record I.e reference link

Thanks for reply and I hope now it is clear

for each new record in Primary link
get business key from Primary link
check if business key exists in reference link
if so then
check if record has changed in reference link
if so then
if (type 1 change) then update in reference link
if (type 2 change) then expire in reference link and add new dimension record in reference link
end if
else
add new dimension record in reference link
end if
next record from Primary link

Re: SCD Clarification

Posted: Mon Apr 30, 2012 10:39 am
by jerome_rajan
chulett wrote:
vamsi.4a6 wrote:2) We did not use any surrogate key in the above algorithm anywhere
Clarify this statement for us, please. Do you mean you just didn't include discussion of the role of a surrogate key in your talk about the 'algorithm' involved or are you saying that you did not leverage a surrogate key in your actual dimension / fact modeling? Jerome is assuming the latter hence the concern about '50 keys' but I would find that lack (if it existed) rather... unusual.
That's correct Craig. The reason being that the only reason I can think of why one would want to design a SCD load without surrogate key is because of a model designed without SKs.

Thanks for reply and I hope now it is clear

for each new record in Primary link
get business key from Primary link
check if business key exists in reference link
if so then
check if record has changed in reference link
if so then
if (type 1 change) then update in reference link
if (type 2 change) then expire in reference link and add new dimension record in reference link
end if
else
add new dimension record in reference link
end if
next record from Primary link
How exactly do you intend to update the REFERENCE data? Are you not eventually doing the same thing as in your Algorithm 1? Only this time you would be using a business key rather than a surrogate key.

If I understand correctly,both are the same and somehow the 2nd Algorithm doesn't *SOUND* correct