approachs for generating surrogate key

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
cetzhbo
Premium Member
Premium Member
Posts: 38
Joined: Tue Aug 28, 2007 10:20 am

approachs for generating surrogate key

Post by cetzhbo »

Hello Gurus,

in v7, there are 5 method to implement surrogate key:

(1) ETL approach (using surrogate key stage). but where do you maintain the latest surrogate key ?

(2) Database approach. bulk load natural keys into a database and into a key mapping table. but I for every new record with new keys, I must put them into key mapping table to get corresponding surrogate key. is that right?

(3) Using a lookup stage to retrieve new surrogate key values from a database via a sparse lookup. what is the source for lookup stage ? what is the key for lookup ?

(4) why not write records directly into dimension table with surrogate key defined as identity ?

(5) create surrogate key using column generate stage. so is there the same issue as ETL approach ? (ie. HOW TO maintain the latest surrogate key ). any other issues ?

I have summarised 5 approach for surrogate key. and leave some question for every approach. hope I can get reply for some gurus.

thanks very much!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You omitted generating the surrogate key value in a Transformer stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

For an ETL surrogate key approach you can find the correct starting value by storing the last key used in a state file or key table or by retrieving the MAX key value from the target table. Incrementing can then be done across parallel partitions by the surrogate key stage or a transformer (or in version 8 in the SCD stage). The tricky part is when two different jobs are changing the same key - such as two sets of customer data from different source systems that are processed at the same type. You can handle this various ways such as concatenating source system ID to the end of the surrogate key for up to 9 distinct source systems.
Post Reply