Performance concern for SCD stage for very large dimension

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
longma98
Participant
Posts: 9
Joined: Thu Aug 14, 2003 5:12 pm

Performance concern for SCD stage for very large dimension

Post by longma98 »

Our company is getting version 8 very soon, and is considering implement type 2 SCD using SCD stage. I have read SCD stage use in-memory lookup. If we have a monster-type SCD type 2, (I have talking about potentially hundreds of millions rows with at least dozens of type-2 columns), is SCD stage still a valid choice?

Has anyone used SCD stage for large volume dimension table? What happens when total dataset size is larger than physically available memory?

Thanks

LM
Raftsman
Premium Member
Premium Member
Posts: 335
Joined: Thu May 26, 2005 8:56 am
Location: Ottawa, Canada

Post by Raftsman »

Appending to your question, we have found the internal surrogate key generator in the SCD very slow for large volumes on the initial load. Is there a way to incorporate the Surrogate key generator stage into this mechanism. It is much more efficient.
Jim Stewart
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi,

Search the forum for CDC topics. That should give you more information on how to handle SCDs as well as to generate SKs.

HTH
--Rich
longma98
Participant
Posts: 9
Joined: Thu Aug 14, 2003 5:12 pm

Post by longma98 »

richdhan wrote:Hi,

Search the forum for CDC topics. That should give you more information on how to handle SCDs as well as to generate SKs.

HTH
--Rich
I don't have a problem using CDC or SK. My question is that if the new SDC stage is rigorous enough for us to throw some heavy stuff at it. The new SCD stage centainly looks very interesting, and will make coding and maintenance much easier.
If we don't have a high confidence that it won't choke on large volumn, then we have to maintain 2 code bases until we test it out during later stage of development cycle. That will make life a little more interesting for us.

LM
Post Reply