approachs for generating surrogate key

cetzhbo · Post by **cetzhbo** » Sun May 18, 2008 1:21 pm

Hello Gurus,

in v7, there are 5 method to implement surrogate key:

(1) ETL approach (using surrogate key stage). but where do you maintain the latest surrogate key ?

(2) Database approach. bulk load natural keys into a database and into a key mapping table. but I for every new record with new keys, I must put them into key mapping table to get corresponding surrogate key. is that right?

(3) Using a lookup stage to retrieve new surrogate key values from a database via a sparse lookup. what is the source for lookup stage ? what is the key for lookup ?

(4) why not write records directly into dimension table with surrogate key defined as identity ?

(5) create surrogate key using column generate stage. so is there the same issue as ETL approach ? (ie. HOW TO maintain the latest surrogate key ). any other issues ?

I have summarised 5 approach for surrogate key. and leave some question for every approach. hope I can get reply for some gurus.

thanks very much!

ray.wurlod · Post by **ray.wurlod** » Sun May 18, 2008 3:19 pm

You omitted generating the surrogate key value in a Transformer stage.

vmcburney · Post by **vmcburney** » Sun May 18, 2008 5:50 pm

For an ETL surrogate key approach you can find the correct starting value by storing the last key used in a state file or key table or by retrieving the MAX key value from the target table. Incrementing can then be done across parallel partitions by the surrogate key stage or a transformer (or in version 8 in the SCD stage). The tricky part is when two different jobs are changing the same key - such as two sets of customer data from different source systems that are processed at the same type. You can handle this various ways such as concatenating source system ID to the end of the surrogate key for up to 9 distinct source systems.