Best Practice SCD

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Raftsman
Premium Member
Premium Member
Posts: 335
Joined: Thu May 26, 2005 8:56 am
Location: Ottawa, Canada

Best Practice SCD

Post by Raftsman »

The SCD stage requires two output files. One for changes and one containing all records. If we do not need the one containing all records, what is the best practice in order not to create files that will never be used. I would put them to the same file but multiple jobs may run at the same time and cause dead locks.

For those who are using the SCD stage, could you please provide some feedback.

Thanks in advance
Jim Stewart
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

It's interesting because it looks like the SCD stage is there to load fact tables with the update of dimensions as a byproduct. The Stage properties has "Output link" as a mandatory drop down combo box so you have to have a fact output whereas dimension update is not mandatory. This means you can only update dimension records where you have new facts against those dimension and you need to haul in all your dimension update fields with your fact records. This is a pain if you have 100 million facts and 6 dimensions with 10 attributes per dimension. You are processing 100 of millions of mostly duplicate dimension attributes.

I think a better way is to process each dimension individually with dummy facts by reading your dimension source table and then sending your fact output to a dead end copy stage. Then build you fact job without SCD stages to do normal dimension lookups and assume the required dimension rows will be there. That way you can refresh your dimensions each day.

I don't know who else has this issue but my star schemas are big - my dimensions have a lot of columns - I really do not think I should be tackling facts and dimensions updates in the one job.
Post Reply