SCD Clarification

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

SCD Clarification

Post by vamsi.4a6 »

Hi,

When I google it,i got the algorithm of SCD stage as below.


for each new record
get business key
check if business key exists in dimension
if so then
check if record has changed
if so then
if (type 1 change) then update dimension record
if (type 2 change) then expire dimension record and add new dimension record
end if
else
add new dimension record
end if
next record

I have few doubts regarding the above algorithm


1) As per my understanding i modified the above algorithm as below. Could anyone please check it out whether my understanding is correct or not?



for each new record in Primary link
get business key from Primary link
check if business key exists in dimension I.e reference link
if so then
check if record has changed in in dimension I.e reference link
if so then
if (type 1 change) then update dimension record I.e reference link
if (type 2 change) then expire dimension record I.e reference link and add new dimension record I.e reference link
end if
else
add new dimension record I.e reference link
end if
next record from Primary link


2) We did not use any surrogate key in the above algorithm anywhere and I know what is meant by surrogate key but I did not understand how it will impact our algorithm and what is the need for defining it in step2 of SCD Stage?

Could any one please clarify on the above two points?
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

What if you business key is a composite key containing 10 columns? Then you would have to update records based on 10 keys. The FACT will also have these 10 columns and assuming that you have 10 dimensions each with an average of 5 key columns, your FACT would end up with 50 keys! Add the measures to that and you have a table that would be a developer's nightmare.

It all depends on your data model is set up. The surrogate key is there to help improve performance, maintainability and help developers formulate more efficient queries in lesser time.

The truth remains that your model can live without the surrogate key but try asking it a question and it's going to get itself all tangled up.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Post by vamsi.4a6 »

@jerome_rajan

I got ur point and Could any one please clarify on the first point?
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

What exactly are you trying to say with the following?

add new dimension record I.e reference link
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: SCD Clarification

Post by chulett »

vamsi.4a6 wrote:2) We did not use any surrogate key in the above algorithm anywhere
Clarify this statement for us, please. Do you mean you just didn't include discussion of the role of a surrogate key in your talk about the 'algorithm' involved or are you saying that you did not leverage a surrogate key in your actual dimension / fact modeling? Jerome is assuming the latter hence the concern about '50 keys' but I would find that lack (if it existed) rather... unusual.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Post by vamsi.4a6 »

jerome_rajan wrote:What exactly are you trying to say with the following?

add new dimension record I.e reference link

Thanks for reply and I hope now it is clear

for each new record in Primary link
get business key from Primary link
check if business key exists in reference link
if so then
check if record has changed in reference link
if so then
if (type 1 change) then update in reference link
if (type 2 change) then expire in reference link and add new dimension record in reference link
end if
else
add new dimension record in reference link
end if
next record from Primary link
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Re: SCD Clarification

Post by jerome_rajan »

chulett wrote:
vamsi.4a6 wrote:2) We did not use any surrogate key in the above algorithm anywhere
Clarify this statement for us, please. Do you mean you just didn't include discussion of the role of a surrogate key in your talk about the 'algorithm' involved or are you saying that you did not leverage a surrogate key in your actual dimension / fact modeling? Jerome is assuming the latter hence the concern about '50 keys' but I would find that lack (if it existed) rather... unusual.
That's correct Craig. The reason being that the only reason I can think of why one would want to design a SCD load without surrogate key is because of a model designed without SKs.

Thanks for reply and I hope now it is clear

for each new record in Primary link
get business key from Primary link
check if business key exists in reference link
if so then
check if record has changed in reference link
if so then
if (type 1 change) then update in reference link
if (type 2 change) then expire in reference link and add new dimension record in reference link
end if
else
add new dimension record in reference link
end if
next record from Primary link
How exactly do you intend to update the REFERENCE data? Are you not eventually doing the same thing as in your Algorithm 1? Only this time you would be using a business key rather than a surrogate key.

If I understand correctly,both are the same and somehow the 2nd Algorithm doesn't *SOUND* correct
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
Post Reply