Understanding the SCD stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
fmou
Participant
Posts: 124
Joined: Sat May 28, 2011 9:48 pm

Understanding the SCD stage

Post by fmou »

I've read the following over and over but still wasn't able to grasp how the SCD stage works, though I know the SCD concept rather well.
"The Slowly Changing Dimension (SCD) stage has a single input link, a single output link, a dimension reference link, and a dimension update link.

The SCD stage reads source data on the input link, performs a dimension table
lookup on the reference link, and writes data on the output link. The output link
can pass data to another SCD stage, to a different type of processing stage, or to
a fact table. The dimension update link is a separate output link that carries
changes for the dimension. You can perform these steps in a single job or a
series of jobs, depending on the number of dimensions in your database and
your performance requirements.

Each SCD stage processes a single dimension and performs lookups by using
an equality matching technique. If the dimension is a database table, the stage
reads the database to build an in memory lookup table of all the current
dimension entries.

If a match is found, the SCD stage updates rows in the dimension table to
reflect the changed data.

If a match is not found, the stage creates a new row in the dimension table. All
of the columns that are needed to create a new dimension row must be
present in the source data.

You can design one or more jobs to process dimensions, update the dimension table, and load the fact table.

Processing dimensions
You can create a separate job for each dimension, one job for all dimensions,
or several jobs, each of which has several dimensions.

Updating dimensions
You can update the dimension table as the job runs by linking the SCD stage
to a database stage, or you can update the dimension table later by sending
dimension changes to a flat file that you use in a separate job. Actual
dimension changes are applied to the lookup table in memory and are
mirrored to the dimension update link, giving you the flexibility to handle a
series of changes to the same dimension row.

Loading the fact table
You can load the fact table as the final step of the job that updates the last
dimension, or in a separate job."
My first question (based on above), about the "source data on the input link", I have the impression that it is of the fact table. Is that so?

Thanks
fmou
Participant
Posts: 124
Joined: Sat May 28, 2011 9:48 pm

Post by fmou »

My first question (based on above), about the "source data on the input link"...
Here is an image, which should make it clearer.

Image
Post Reply