Page 1 of 1

implementing a slowly changing Dim table

Posted: Sun Dec 11, 2005 6:59 pm
by Austin
Hi all,

I'm fairly new to DW world and am working on a internship project for an internet marketting firm. I am to design a Product dimension table in a Server job, where it is pulling in data from 3 different sources (seq. file, oracle table & sybase table). Also this dimension table has to be a Slowly Changing Dimension table. My idea is to design a one job for each data source and dump the data in an OCI stage in each of these 3 jobs. Next I would make a 4th job to add the data from these 3 jobs together into one final table. What I am not clear about is that at what point & how do I make the output table as Slowly Changing Dimension table. I've read about Surrogate key(SK), how do I generate SK in DS Server job and how can it be helpful in making the table a Slowly Changing Dimension Table?

Any help, hints would be greatly appreciated - Thanks

Austin.

Posted: Sun Dec 11, 2005 7:20 pm
by ray.wurlod
Welcome aboard! :D

You have a steep learning curve. The first step is to garner some understanding of what a slowly changing dimension is. There are many books out there - one of the best is The Data Warehouse Life Cycle and Toolkit by Ralph Kimball and others.

Slowly changing dimension tables implement slowly changing dimensions, but you need to have decided what kind of SCD you need - Type 1 and Type 2 are the more common.

That decision in turn influences how you implement in DataStage - it's largely a matter of manipulating surrogate key values, and possibly some effective date columns.

If you search the forum for appropriate terms (don't miss SCD), you will get some ideas.

Once you've formulated a design, you might like to check back with another post here.

Another possibility is to wait six months or so, when there will be an SCD stage within the DataStage product.

Posted: Sun Dec 11, 2005 8:09 pm
by Austin
ray.wurlod wrote:Welcome aboard! :D

You have a steep learning curve. The first step is ......
Hello Ray and thanks for note. I forgot to mention that the requirements for this dim table is of type 2 wherein they want to create a new record.
I've done some internet search on it, came across a very good article on SCD by Kimball and have some basic knowledge on this topic now. The idea is to add another col to obtain a time stamp so as to determine if the record has changed since last update. I just need help in actually putting it together in DS.

Posted: Sun Dec 11, 2005 8:30 pm
by jenkinsrob
I suggest you have a look at the Datastage CRC32 function. You can use this to determine whether anything has changed on the record since your last extract.

What you are trying to do it a standard Datawarehousing requirement and I am sure a search of this forum would return lots of helpful information.

Good Luck!

scd 2

Posted: Mon Dec 12, 2005 12:06 am
by sasi_kumarreddy
Hi,

i have a small idea about scd 2 will you try this once..

target
l
l
look up(hash file)
l
l
transformer----------------- target
l
source

take 3 instences of the same target.

first one for update existing row(expiry date)
second for insert new record.
third for insert updtated record.

Expert s can clarify me if i am wrong at any place..

Posted: Tue Dec 13, 2005 12:34 am
by Austin
I came up with the design of Source ------>Transf ----->Hash ---> Target(Insert) , another O/P from Transf for Target(Update) and then another OCI for Target(Copy) -----> Hash which then joins back to Transf.

Problem is that I'll have to run the job initially with only Target(Insert) and then add rest of the O/P stages (Update & Copy).

Posted: Tue Dec 13, 2005 4:57 am
by sasi_kumarreddy
Austin,

Do you mean that your target is null at the start..
If it is so then you can use a constraint in the transformer that is null of target for the insert target instance.

Posted: Tue Dec 13, 2005 8:08 am
by chulett
In other words, there's probably no reason to have a job design that is only good for the initial run and then has to be changed for any subsequent runs. :?

Unless, perhaps, you are in a position where the 'initial' load volume is significantly larger than the daily incremental loads or needs to be handled differently somehow and thus you could leverage two different job designs.

I'm afraid I don't follow what you've laid out as your job design, so you may need to post a picture of it - or try you hand at an ascii art rendering using the code tags and much 'Preview' verification - to get more help with it.