SCD Clarification

vamsi.4a6 · Post by **vamsi.4a6** » Mon Apr 30, 2012 4:19 am

Hi,

When I google it,i got the algorithm of SCD stage as below.

for each new record
get business key
check if business key exists in dimension
if so then
check if record has changed
if so then
if (type 1 change) then update dimension record
if (type 2 change) then expire dimension record and add new dimension record
end if
else
add new dimension record
end if
next record

I have few doubts regarding the above algorithm

1) As per my understanding i modified the above algorithm as below. Could anyone please check it out whether my understanding is correct or not?

for each new record in Primary link
get business key from Primary link
check if business key exists in dimension I.e reference link
if so then
check if record has changed in in dimension I.e reference link
if so then
if (type 1 change) then update dimension record I.e reference link
if (type 2 change) then expire dimension record I.e reference link and add new dimension record I.e reference link
end if
else
add new dimension record I.e reference link
end if
next record from Primary link

2) We did not use any surrogate key in the above algorithm anywhere and I know what is meant by surrogate key but I did not understand how it will impact our algorithm and what is the need for defining it in step2 of SCD Stage?

Could any one please clarify on the above two points?

jerome_rajan · Post by **jerome_rajan** » Mon Apr 30, 2012 9:14 am

What if you business key is a composite key containing 10 columns? Then you would have to update records based on 10 keys. The FACT will also have these 10 columns and assuming that you have 10 dimensions each with an average of 5 key columns, your FACT would end up with 50 keys! Add the measures to that and you have a table that would be a developer's nightmare.

It all depends on your data model is set up. The surrogate key is there to help improve performance, maintainability and help developers formulate more efficient queries in lesser time.

The truth remains that your model can live without the surrogate key but try asking it a question and it's going to get itself all tangled up.

vamsi.4a6 · Post by **vamsi.4a6** » Mon Apr 30, 2012 10:06 am

@jerome_rajan

I got ur point and Could any one please clarify on the first point?

jerome_rajan · Post by **jerome_rajan** » Mon Apr 30, 2012 10:11 am

What exactly are you trying to say with the following?

add new dimension record I.e reference link

chulett · Post by **chulett** » Mon Apr 30, 2012 10:14 am

vamsi.4a6 wrote:2) We did not use any surrogate key in the above algorithm anywhere

Clarify this statement for us, please. Do you mean you just didn't include discussion of the role of a surrogate key in your talk about the 'algorithm' involved or are you saying that you did not leverage a surrogate key in your actual dimension / fact modeling? Jerome is assuming the latter hence the concern about '50 keys' but I would find that lack (if it existed) rather... unusual.

vamsi.4a6 · Post by **vamsi.4a6** » Mon Apr 30, 2012 10:17 am

jerome_rajan wrote:What exactly are you trying to say with the following?

add new dimension record I.e reference link

Thanks for reply and I hope now it is clear

for each new record in Primary link
get business key from Primary link
check if business key exists in reference link
if so then
check if record has changed in reference link
if so then
if (type 1 change) then update in reference link
if (type 2 change) then expire in reference link and add new dimension record in reference link
end if
else
add new dimension record in reference link
end if
next record from Primary link

jerome_rajan · Post by **jerome_rajan** » Mon Apr 30, 2012 10:39 am

chulett wrote:
vamsi.4a6 wrote:2) We did not use any surrogate key in the above algorithm anywhere
Clarify this statement for us, please. Do you mean you just didn't include discussion of the role of a surrogate key in your talk about the 'algorithm' involved or are you saying that you did not leverage a surrogate key in your actual dimension / fact modeling? Jerome is assuming the latter hence the concern about '50 keys' but I would find that lack (if it existed) rather... unusual.

That's correct Craig. The reason being that the only reason I can think of why one would want to design a SCD load without surrogate key is because of a model designed without SKs.

Thanks for reply and I hope now it is clear

for each new record in Primary link
get business key from Primary link
check if business key exists in reference link
if so then
check if record has changed in reference link
if so then
if (type 1 change) then update in reference link
if (type 2 change) then expire in reference link and add new dimension record in reference link
end if
else
add new dimension record in reference link
end if
next record from Primary link

How exactly do you intend to update the REFERENCE data? Are you not eventually doing the same thing as in your Algorithm 1? Only this time you would be using a business key rather than a surrogate key.

If I understand correctly,both are the same and somehow the 2nd Algorithm doesn't *SOUND* correct

DSXchange

SCD Clarification

SCD Clarification

Re: SCD Clarification

Re: SCD Clarification