Page 1 of 1

Surrogate Key Generation

Posted: Mon Jan 17, 2005 3:46 pm
by venkatrranga
Hi,

We are planning to migrate 100+ tables in incremental mode. Out of which 25 of them are Fact and Rest of them are dimension tables. I am trying to come up with optimal mechanisam to produce Surrogate Key.

Is there any out of the box solution available from Ascential (like Oracle Sequences ) or Do I have to design my self a Surrogate Key table for each Extract I develop.

Please let me know your thoughts.

Regards
Venkat Ranga

Posted: Mon Jan 17, 2005 4:33 pm
by vmcburney
Yes, there is an out of the box solution of sorts. In DataStage Manager have a look at the Routines - sdk - KeyMgt folder for two examples of routines that manage surrogate key numbering. KeyMgtGetNextValue is for generating keys for a sequential load while KeyMgtGetNextValueConcurrent is for multiple instance or parallel loads.

The routines have comments in them to explain how they work.

Posted: Mon Jan 17, 2005 8:42 pm
by ray.wurlod
There's nothing to stop you using database sequences or SERIAL data types when loading the dimension tables.

Whatever mechanism you use to load the dimension tables, you will need to build a "reverse lookup" mechanism when loading the fact tables, because you will need to map the "business keys" encountered in the data you are loading to the surrogate keys you have used in the dimension tables. Hashed files are ideal for this purpose.

Re: Surrogate Key Generation

Posted: Wed Jan 19, 2005 10:36 am
by peternolan9
venkatrranga wrote:Hi,

We are planning to migrate 100+ tables in incremental mode. Out of which 25 of them are Fact and Rest of them are dimension tables. I am trying to come up with optimal mechanisam to produce Surrogate Key.

Is there any out of the box solution available from Ascential (like Oracle Sequences ) or Do I have to design my self a Surrogate Key table for each Extract I develop.

Please let me know your thoughts.

Regards
Venkat Ranga
Venkat,
please be aware that the one out of the box is very slow and suffers from locking problems as it updates the hash table for each key it fetches. We got some code to update 1000 keys at a time...I do not believe the code is in the public domain....