Page 1 of 1

SCD - Architecture

Posted: Fri Nov 06, 2009 1:53 pm
by Raftsman
What is the design architecture of the SCD? Is it the SCD stage a lookup with extra functionality or it is a join stage with extra functionality. We are not if we need to sort and repartition keys prior to entering the stage. Is the lookup done in memory?

We are not sorting and repartitioning key and for now, most of our jobs have returned correct value. Another collegue of mine ran into an issue where records did not much and created new surrogates.

What is the standard rule for using this stage.

Thanks

Posted: Fri Nov 06, 2009 2:31 pm
by chulett
I thought all that was explained fairly well in the Parallel Job Developer Guide pdf entry for that stage.

Posted: Fri Nov 06, 2009 2:36 pm
by Raftsman
So did I until I ran into serious issues with the stage. We have implemented this stage in all our EDW and now we are second guessing the way it works. I am just looking for a simple answer, does the stage do it lookup in memory or is it like a join. If it's like a join, then we have serious issues. We will need to go through all our module and insert partitioning and sorting.

BTW, I always refer to the manual prior to asking a question. The manuals are not the greatest.

Posted: Fri Nov 06, 2009 2:49 pm
by chulett
From the manual:
Each SCD stage processes a single dimension and performs lookups by using an equality matching technique. If the dimension is a database table, the stage reads the database to build a lookup table in memory.
There's also this:
Input data to SCD stages must accurately represent the order in which events occurred. You might need to presort your input data by a sequence number or a date field. If a job has multiple SCD stages, you must ensure that the sort order of the input data is correct for each stage.

If the SCD stage is running in parallel, the input data must be hash partitioned by key. Hash partitioning allows all records with the same business key to be handled by the same process. The SCD stage divides the dimension table across processes by building a separate lookup table for each process.
Does that help?

Posted: Fri Nov 06, 2009 3:13 pm
by Raftsman
So what does AUTO do. Would it not take care of this. We do not have to partition and presort the lookup stage. I would of assumed the same.

Posted: Fri Nov 06, 2009 3:28 pm
by chulett
While you don't "have to" partition / sort for the lookup stage, there are times when it is best to explicitly do that yourself rather than hoping the "auto" partitioning will get it right. Just went through that here.

Ray has a post here somewhere where he explains what "auto" partitioning does (and does not) do for you and from what I recall, when it's best to switch it off. I'll see if I can dig it up... unless Ray or Arnd beats me to it or answers it here. :wink:

Posted: Fri Nov 06, 2009 3:32 pm
by chulett
Here's one but don't think it's the one I was actually remembering, will poke around again later.

viewtopic.php?t=128688