Generating a surrogate key

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

Generating a surrogate key

Post by just4u_sharath »

source: sequential file
surrogate key generator
target : dataset

I have data in sequential file and i am generating surrogate key and loading the data to the dataset. Today i am loading 100 records and i got a sequence of 1 to 100 using surrogate key generator. Now the issue is that next day when i am loading another 100 records i need to get a sequence from 101. can you please help me with this issue

thanks
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Get the max of the table and feed that value as initial value to the surrogate generator stage.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
reddy.vinod
Participant
Posts: 36
Joined: Mon Jul 16, 2007 3:37 am
Location: USA

Re: Generating a surrogate key

Post by reddy.vinod »

just4u_sharath wrote:source: sequential file
surrogate key generator
target : dataset

I have data in sequential file and i am generating surrogate key and loading the data to the dataset. Today i am loading 100 records and i got a sequence of 1 to 100 using surrogate key generator. Now the issue is that next day when i am loading another 100 records i need to get a sequence from 101. can you please help me with this issue

thanks
using "KeyMgtGetNextValue" we can get unique values in server. Just it try in parallel.
VINOD
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The server routine "KeyMgtGetNextValue" won't work in PX unless you revert to using a BASIC transform stage and that is not recommended. The surrogate key generation is not designed to store the highest value used after a run for use in the next one. The use of a database sequence is usually the most efficient approach and the only drawback it has it that the generated surrogate key value is not returned to the program. If you don't need to know the value for additional processing in your job, then use your databases' sequence generators, otherwise you will need to get the highest value in the table as mentioned by DSGuru2b - but bear in mind that by using MAX you have no guarantee of concurrency and might get duplicate values when several processes are running at the same time.
Post Reply