Slowly Changin Dimension

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
c341
Participant
Posts: 39
Joined: Mon Jan 26, 2004 8:43 pm

Slowly Changin Dimension

Post by c341 »

Hi
this is about Type 2 SCD...
i want to check the incoming records...whether it is already exist or not...if the record is a new one then insert into the database table...if the record already exit the update record should be inserted with new values and simultaneously it should update the previous record ( existing records)..
Eg...if i have Start Date and End Date....

ProductID StartDate EndDate
100 01/02/04 (will be null)(need to updated)
100 01/20/04 Null

A new value of existing record is inserted with the start date and endate as NULL, when this occurs the previous record's end date should be updated (it should be equal to the start date of the record inserted...)

Thank You
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I did not see a question here. Are you asking how to design the ETL jobs to support such a task?

I would start first with your modeling. You forgot in your model to include a surrogate key for ProductID, maybe ProductKey would be a good name. You also have chosen to use NULL as a startdate. This means that you will never be able to use SQL BETWEEN clauses to find data. Furthermore, since you did not include a surrogate key, then everywhere you need have a this dimension as a foreign key you will have the natural key. This means every join to the dimension will have to do a dimension join using a BETWEEN statement. This is the worst performing SQL you could choose.

You should design a dimension like:

Code: Select all

ProductKey   <pk>
ProductID     <natural key>
...
...
...
TVStartDate     <timestamp when your Time Variant row begins effectivity>
TVEndDate       <timestamp when your Time Variant row ends effectivity>
TVnumber        <incrementing counter of how many Time Variants exist for this natural key>
TVCurrentInd   <TRUE/FALSE indicator if this row is the most current>
You should also use a date waaaay out in the future for the TVEndDate on the most current row. As you create the next time variant row, you will stamp the prior current row with an ending effective date, as well as remove the current indicator.

Your surrogate key will be assigned to the natural key plus the start date. This establishes uniqueness. Everywhere that this dimension is a foreign key you should embed the surrogate key. This means that your joins will be pure Star joins, without BETWEEN statements (which your current design makes difficult), and supports natural indexes. You also gain the the power that the appropriate time variant dimension row aligns to the fact, because your ETL did the heavy lifting during transformation. This further supports the effort to enforce RI during ETL, and allow you the discretion to remove RI constraints in the database.

Thoughts?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Here's some posts that cover SCDs and their implementation using DataStage and alll the nasty stuff you need to think about:

viewtopic.php?t=86234

viewtopic.php?t=85542

viewtopic.php?t=85569

viewtopic.php?t=85822

viewtopic.php?t=86187

viewtopic.php?t=84998

viewtopic.php?t=86681

viewtopic.php?t=85970
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
c341
Participant
Posts: 39
Joined: Mon Jan 26, 2004 8:43 pm

Post by c341 »

Hi
I got a better idea now...let me implement this.....
Thank You very much
c41
vijay_reddy
Participant
Posts: 5
Joined: Wed Feb 11, 2004 11:22 pm

Post by vijay_reddy »

kcbland wrote:Here's some posts that cover SCDs and their implementation using DataStage and alll the nasty stuff you need to think about:

viewtopic.php?t=86234

viewtopic.php?t=85542

viewtopic.php?t=85569

viewtopic.php?t=85822

viewtopic.php?t=86187

viewtopic.php?t=84998

viewtopic.php?t=86681

viewtopic.php?t=85970
Hi,

These sites above mentioned are not anymore

thanks
vr
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes they are. :o

Maybe you've had network/internet problems?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

vijay_reddy wrote:These sites above mentioned are not anymore
I clicked on every one, and they opened a new Explorer. Perhaps you're blocking pop-ups?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
vijay_reddy
Participant
Posts: 5
Joined: Wed Feb 11, 2004 11:22 pm

Post by vijay_reddy »

kcbland wrote:
vijay_reddy wrote:These sites above mentioned are not anymore
I clicked on every one, and they opened a new Explorer. Perhaps you're blocking pop-ups?
Hi Kenneth,

REally it did not opened , i tried n number of times.
Now it is opened, may be there was some problem with my net.

Thanks
vr
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I've seen something like that before.
It's documented at http://www.dilbert.com/comics/dilbert/a ... 522841.jpg
:lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply