Key generation in a multiple instance job

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
NEO
Premium Member
Premium Member
Posts: 163
Joined: Mon Mar 22, 2004 5:49 pm

Key generation in a multiple instance job

Post by NEO »

Hi,
I have a job which retreives the max value of the key from a DB2 table and increments it in datastage by using @OUTROWNUM when loading data to the above DB2 table. This approach works fine when my job is the only job loading to the table. But this approach will fail if multiple jobs are loading to the table at the same time. But retreiving the max value from the table for every row sounds like a lot of load. And I am not quite sure how to exactly do that even if I wanted to.
Thanks,
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

You can use a Sequence in DB2 (I think they have Sequences) and get a new one for each insert.

Another approach would be to use a DataStage Sequence and acquire it with the KeyMgtGetNextValueConcurrent routine. There are ways to seed it, so that it starts with the correct value if you need that, but you should only have to do that once, unless values are inserted into the DB2 table from other sources than your DataStage jobs.

Good Luck,
Tony
NEO
Premium Member
Premium Member
Posts: 163
Joined: Mon Mar 22, 2004 5:49 pm

Post by NEO »

If I use the KeyMgtGetNextValueConcurrent routine, it gaurantees that if multiple instances of my job run, then the key is still unique. That gets me to the next question. If there are multiple jobs running in multiple instance loading to the same table, the uniqueness of the key cant be maintained unless all the different jobs use the same ID in the KeyMgtGetNextValueConcurrent routine when generating the key. If they use different IDs then it will be a problem. How are such scenarios handled?
Thanks.
tonystark622 wrote:You can use a Sequence in DB2 (I think they have Sequences) and get a new one for each insert.

Another approach would be to use a DataStage Sequence and acquire it with the KeyMgtGetNextValueConcurrent routine. There are ways to seed it, so that it starts with the correct value if you need that, but you should only have to do that once, unless values are inserted into the DB2 table from other sources than your DataStage jobs.

Good Luck,
Tony
kommven
Charter Member
Charter Member
Posts: 125
Joined: Mon Jul 12, 2004 12:37 pm

Post by kommven »

I have faced a similar scenario and I used same ID which is the target table key column name. and it solved my problem. Editing jobs and changing the key sequencer ID is much easier.
tomengers
Participant
Posts: 167
Joined: Tue Nov 19, 2002 12:20 pm
Location: Key West

Post by tomengers »

NEO ...

Be aware that KeyMgtGetNextValueConcurrent() locks the SDKSequences table for each number retrieved. If you have a volume job ... this will create a significant choke point.

... tom
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

Tom,

What solution do you recommend for this problem?

Tony
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Whatever you do to guarantee unique IDs will involve single-threading, and therefore a "choke point", so you have to make it as quick as possible.

You could modify the SDK routine and make it more efficient. You could rely on a sequence or a SERIAL data type in the target database. If you're using integer instance numbers you could use an expression like

Code: Select all

startvalue + 1000000 * invocationID + @OUTROWNUM
(select an appropriate constant based on your load volumes; startvalue is obtained from the target table as the maximum value for which the invocation range is the one in question).

You can NOT use an approach that uses a variable defined to be in a COMMON area, as this is local to one process. Multiple instances run in separate processes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply