Getting surrogate keys back from a db2 sequence

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
HSBCdev
Premium Member
Premium Member
Posts: 141
Joined: Tue Mar 16, 2004 8:22 am
Location: HSBC - UK and India
Contact:

Getting surrogate keys back from a db2 sequence

Post by HSBCdev »

I've got a job which uses a lookup to a db2 stage. The db2 stage is set as a sparse lookup and uses the user defined sql

SELECT (NEXTVAL for SEQUENCE1) as SURROGATEKEY
FROM SYSIBM.SYSDUMMY1

I've outputed the results to a file and found that I get the same value for the surrogate key on all the records on a particular node going into the lookup.

There are 3 nodes on database so I only get 3 different values returned over all the records going into the lookup.

How do I set this up so I get a unique value for each record?
HSBCdev
Premium Member
Premium Member
Posts: 141
Joined: Tue Mar 16, 2004 8:22 am
Location: HSBC - UK and India
Contact:

Post by HSBCdev »

Has anyone managed to use a database sequence in a parallel job to generate a unique value for each record being processed? Am I misunderstanding what a lookup to a db2 sequence should achieve in a parallel job?
HSBCdev
Premium Member
Premium Member
Posts: 141
Joined: Tue Mar 16, 2004 8:22 am
Location: HSBC - UK and India
Contact:

Post by HSBCdev »

I've now changed my lookup SQL to be as follows:


SELECT (NEXTVAL for SEQUENCE1) as SURROGATEKEY
FROM SYSIBM.SYSDUMMY1
WHERE ORCHESTRATE.INPUTFIELD > -1

This works fine when my input to the job is a row generator but gives me the original problem when I use a sequential file.

Any ideas why this is?
HSBCdev
Premium Member
Premium Member
Posts: 141
Joined: Tue Mar 16, 2004 8:22 am
Location: HSBC - UK and India
Contact:

Post by HSBCdev »

I've ended up using a server shared container within a PX job to do the lookup on the db2 sequence.
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Why not pull a single value with a lookup dummy field, join it, use a surrogate key generator stage to generate surrogate key, and do the math (surrkey + generated_key).

This will be a MUCH more performant solution than what you are doing originally, and even far more so than what you are doing now.
HSBCdev
Premium Member
Premium Member
Posts: 141
Joined: Tue Mar 16, 2004 8:22 am
Location: HSBC - UK and India
Contact:

Post by HSBCdev »

There could be other jobs creating new surrogate keys on the same table at the same time. I presume the solution you suggest relies on this being the only job creating the key.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

WHERE ORCHESTRATE.INPUTFIELD > -1

Because you're working on a lookup, you have to have a WHERE clause, one that will always return a row. Elsewhere on the Forum (perhaps you found it in a search) I posted a similar solution for Oracle, from memory WHERE ROWNUM = :1 and provided 1 as the reference key expression.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Titto
Participant
Posts: 148
Joined: Tue Jun 21, 2005 7:49 am

Post by Titto »

HSBCdev wrote:There could be other jobs creating new surrogate keys on the same table at the same time. I presume the solution you suggest relies on this being the only job creating the key.
Hi HSBCDev,

I have same kind of requirement and i need the sequence number to write to a file so that there are other processes using this number down the line. I created container as you said but still it is creating the same number for all the row. I am using Oracle as a database.
Could you please share how did you achive this?

Any help from other is appreciated!

Thanks
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I don't like this design as the sparse lookup is a very expensive overhead on your job. You've got to ask what advantage you are getting from using a DB2 sequence when you have to retrieve it for every row. I think you would be better off generating keys within your job using a parallel counter, retrieve the maximum value from the DB2 sequence and pass it into your job as a parameter, update this sequence after you have loaded all rows.

We currently retrieve values from a DB2 sequence as we insert rows into tables by putting the NEXTVAL into the insert statement.
Titto
Participant
Posts: 148
Joined: Tue Jun 21, 2005 7:49 am

Post by Titto »

vmcburney wrote:I don't like this design as the sparse lookup is a very expensive overhead on your job. You've got to ask what advantage you are getting from using a DB2 sequence when you have to retrieve it for every row. I think you would be better off generating keys within your job using a parallel counter, retrieve the maximum value from the DB2 sequence and pass it into your job as a parameter, update this sequence after you have loaded all rows.

We currently retrieve values from a DB2 sequence as we insert rows into tables by putting the NEXTVAL into the insert statement.
Hi,

as the same table been used by other modules that is why we planned to use sequence from Oracle database.

but, could you please eloborate your last statement - "We currently retrieve values from a DB2 sequence as we insert rows into tables by putting the NEXTVAL into the insert statement" - how do you get the the sequence number while inserting, i am also expecting the same.
I need to write sequence number along with other fields to a seq file to use as input in other jobs.

your help is appreciated.

Thanks,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When using NEXTVAL in the INSERT statement you don't get the value back. That's the whole point. It's faster.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Titto
Participant
Posts: 148
Joined: Tue Jun 21, 2005 7:49 am

Post by Titto »

ray.wurlod wrote:When using NEXTVAL in the INSERT statement you don't get the value back. That's the whole point. It's faster.
Hi Ray,

but i need the latest Sequence number to write out to a file, can i use CURRVAL after the NEXTVAL (Insert statement) - in next stage, can it give the recently inserted value? please suggest me is it best way to do?

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Then you can't use NEXTVAL in the INSERT statement, and need to use some other mechanism.

CURRVAL is not available within the job, which is probably (a) using NEXTVAL on all partitions, (b) implementing row buffering, (c) not committing every row.

Probably the most efficient mechanism for your particular requirement is to lock the table, generate the keys within the job, and reset the sequence to the new "next value" when done before releasing the table-level lock.

Or you can use NEXTVAL in the INSERT statement provided that you have some other means to identify the rows (such as run batch identifier and/or timestamp) so that you can retrieve those rows from target for ongoing processing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply