implementing a slowly changing Dim table

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Austin
Participant
Posts: 7
Joined: Tue Nov 15, 2005 6:32 pm

implementing a slowly changing Dim table

Post by Austin »

Hi all,

I'm fairly new to DW world and am working on a internship project for an internet marketting firm. I am to design a Product dimension table in a Server job, where it is pulling in data from 3 different sources (seq. file, oracle table & sybase table). Also this dimension table has to be a Slowly Changing Dimension table. My idea is to design a one job for each data source and dump the data in an OCI stage in each of these 3 jobs. Next I would make a 4th job to add the data from these 3 jobs together into one final table. What I am not clear about is that at what point & how do I make the output table as Slowly Changing Dimension table. I've read about Surrogate key(SK), how do I generate SK in DS Server job and how can it be helpful in making the table a Slowly Changing Dimension Table?

Any help, hints would be greatly appreciated - Thanks

Austin.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard! :D

You have a steep learning curve. The first step is to garner some understanding of what a slowly changing dimension is. There are many books out there - one of the best is The Data Warehouse Life Cycle and Toolkit by Ralph Kimball and others.

Slowly changing dimension tables implement slowly changing dimensions, but you need to have decided what kind of SCD you need - Type 1 and Type 2 are the more common.

That decision in turn influences how you implement in DataStage - it's largely a matter of manipulating surrogate key values, and possibly some effective date columns.

If you search the forum for appropriate terms (don't miss SCD), you will get some ideas.

Once you've formulated a design, you might like to check back with another post here.

Another possibility is to wait six months or so, when there will be an SCD stage within the DataStage product.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Austin
Participant
Posts: 7
Joined: Tue Nov 15, 2005 6:32 pm

Post by Austin »

ray.wurlod wrote:Welcome aboard! :D

You have a steep learning curve. The first step is ......
Hello Ray and thanks for note. I forgot to mention that the requirements for this dim table is of type 2 wherein they want to create a new record.
I've done some internet search on it, came across a very good article on SCD by Kimball and have some basic knowledge on this topic now. The idea is to add another col to obtain a time stamp so as to determine if the record has changed since last update. I just need help in actually putting it together in DS.
jenkinsrob
Participant
Posts: 31
Joined: Mon Dec 01, 2003 6:24 am
Location: London

Post by jenkinsrob »

I suggest you have a look at the Datastage CRC32 function. You can use this to determine whether anything has changed on the record since your last extract.

What you are trying to do it a standard Datawarehousing requirement and I am sure a search of this forum would return lots of helpful information.

Good Luck!
sasi_kumarreddy
Participant
Posts: 27
Joined: Thu Aug 25, 2005 6:33 am

scd 2

Post by sasi_kumarreddy »

Hi,

i have a small idea about scd 2 will you try this once..

target
l
l
look up(hash file)
l
l
transformer----------------- target
l
source

take 3 instences of the same target.

first one for update existing row(expiry date)
second for insert new record.
third for insert updtated record.

Expert s can clarify me if i am wrong at any place..
SASI
Austin
Participant
Posts: 7
Joined: Tue Nov 15, 2005 6:32 pm

Post by Austin »

I came up with the design of Source ------>Transf ----->Hash ---> Target(Insert) , another O/P from Transf for Target(Update) and then another OCI for Target(Copy) -----> Hash which then joins back to Transf.

Problem is that I'll have to run the job initially with only Target(Insert) and then add rest of the O/P stages (Update & Copy).
sasi_kumarreddy
Participant
Posts: 27
Joined: Thu Aug 25, 2005 6:33 am

Post by sasi_kumarreddy »

Austin,

Do you mean that your target is null at the start..
If it is so then you can use a constraint in the transformer that is null of target for the insert target instance.
SASI
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

In other words, there's probably no reason to have a job design that is only good for the initial run and then has to be changed for any subsequent runs. :?

Unless, perhaps, you are in a position where the 'initial' load volume is significantly larger than the daily incremental loads or needs to be handled differently somehow and thus you could leverage two different job designs.

I'm afraid I don't follow what you've laid out as your job design, so you may need to post a picture of it - or try you hand at an ascii art rendering using the code tags and much 'Preview' verification - to get more help with it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply