Hi,
When I google it,i got the algorithm of SCD stage as below.
for each new record
get business key
check if business key exists in dimension
if so then
check if record has changed
if so then
if (type 1 change) then update dimension record
if (type 2 change) then expire dimension record and add new dimension record
end if
else
add new dimension record
end if
next record
I have few doubts regarding the above algorithm
1) As per my understanding i modified the above algorithm as below. Could anyone please check it out whether my understanding is correct or not?
for each new record in Primary link
get business key from Primary link
check if business key exists in dimension I.e reference link
if so then
check if record has changed in in dimension I.e reference link
if so then
if (type 1 change) then update dimension record I.e reference link
if (type 2 change) then expire dimension record I.e reference link and add new dimension record I.e reference link
end if
else
add new dimension record I.e reference link
end if
next record from Primary link
2) We did not use any surrogate key in the above algorithm anywhere and I know what is meant by surrogate key but I did not understand how it will impact our algorithm and what is the need for defining it in step2 of SCD Stage?
Could any one please clarify on the above two points?
SCD Clarification
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 376
- Joined: Sat Jan 07, 2012 12:25 pm
- Location: Piscataway
What if you business key is a composite key containing 10 columns? Then you would have to update records based on 10 keys. The FACT will also have these 10 columns and assuming that you have 10 dimensions each with an average of 5 key columns, your FACT would end up with 50 keys! Add the measures to that and you have a table that would be a developer's nightmare.
It all depends on your data model is set up. The surrogate key is there to help improve performance, maintainability and help developers formulate more efficient queries in lesser time.
The truth remains that your model can live without the surrogate key but try asking it a question and it's going to get itself all tangled up.
It all depends on your data model is set up. The surrogate key is there to help improve performance, maintainability and help developers formulate more efficient queries in lesser time.
The truth remains that your model can live without the surrogate key but try asking it a question and it's going to get itself all tangled up.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
-
- Premium Member
- Posts: 376
- Joined: Sat Jan 07, 2012 12:25 pm
- Location: Piscataway
What exactly are you trying to say with the following?
add new dimension record I.e reference link
add new dimension record I.e reference link
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
Re: SCD Clarification
Clarify this statement for us, please. Do you mean you just didn't include discussion of the role of a surrogate key in your talk about the 'algorithm' involved or are you saying that you did not leverage a surrogate key in your actual dimension / fact modeling? Jerome is assuming the latter hence the concern about '50 keys' but I would find that lack (if it existed) rather... unusual.vamsi.4a6 wrote:2) We did not use any surrogate key in the above algorithm anywhere
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
jerome_rajan wrote:What exactly are you trying to say with the following?
add new dimension record I.e reference link
Thanks for reply and I hope now it is clear
for each new record in Primary link
get business key from Primary link
check if business key exists in reference link
if so then
check if record has changed in reference link
if so then
if (type 1 change) then update in reference link
if (type 2 change) then expire in reference link and add new dimension record in reference link
end if
else
add new dimension record in reference link
end if
next record from Primary link
-
- Premium Member
- Posts: 376
- Joined: Sat Jan 07, 2012 12:25 pm
- Location: Piscataway
Re: SCD Clarification
That's correct Craig. The reason being that the only reason I can think of why one would want to design a SCD load without surrogate key is because of a model designed without SKs.chulett wrote:Clarify this statement for us, please. Do you mean you just didn't include discussion of the role of a surrogate key in your talk about the 'algorithm' involved or are you saying that you did not leverage a surrogate key in your actual dimension / fact modeling? Jerome is assuming the latter hence the concern about '50 keys' but I would find that lack (if it existed) rather... unusual.vamsi.4a6 wrote:2) We did not use any surrogate key in the above algorithm anywhere
How exactly do you intend to update the REFERENCE data? Are you not eventually doing the same thing as in your Algorithm 1? Only this time you would be using a business key rather than a surrogate key.Thanks for reply and I hope now it is clear
for each new record in Primary link
get business key from Primary link
check if business key exists in reference link
if so then
check if record has changed in reference link
if so then
if (type 1 change) then update in reference link
if (type 2 change) then expire in reference link and add new dimension record in reference link
end if
else
add new dimension record in reference link
end if
next record from Primary link
If I understand correctly,both are the same and somehow the 2nd Algorithm doesn't *SOUND* correct
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.