How to implment SCD (slowly changing dimension)

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
raju_chvr
Premium Member
Premium Member
Posts: 165
Joined: Sat Sep 27, 2003 9:19 am
Location: USA

Re: How to implment SCD (slowly changing dimension)

Post by raju_chvr »

What DB are you using? theres is something like 'Transaction Handling' and this can be enabled by checking the option: 'Enable Transaction Handling'.

may be you can use that or if you don't want to design in that manner go for 2 jobs. UPDATE in the first job and in the second job INSERT.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Yes, I have seen this problem many times. You need the benefit of a non-caching, instantly committed transaction without the performance overhead associated with such activity. If you look at the SCD question posted just a few days ago, where I listed a numerous number of links to posts covering this topic, I describe at length the problems with designs exactly as yours.

DataStage Server has hash file technology that allows you to do all of your such work off-target, meaning in a sandbox setup within hash file structures. You'll find that DataStage has high-performance reference and write capabilities to hash files, without all that messing database overhead. Even with read/write caching disabled, you'll find that performance is very good. Couple that with multiple clones (instances) of a job tackling source data in a divide-n-conquer approach, you'll get thru your datasets quickly. Then, you can separate inserts and updates into distinct load files and then bulk load your inserts and then do updates.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply