surrogate ID generation in DS EE

ray.wurlod · Post by **ray.wurlod** » Thu Sep 29, 2005 4:30 pm

Try searching the forum and on-line help. For example you will learn that there is a Surrogate Key Generator stage in DS EE. There are many other techniques within DS, which your search will uncover.

It mainly boils down to two questions for me. One is whether other agencies are likely to be updating the table at the same time. In that case always generate keys in the database, and ensure that both DS and the other agencies follow the same mechanism (to avoid duplicate keys). The other is the efficiency of the database mechanism. Recently I visited a site where they used a before-insert trigger to get the next value from an Oracle sequence to replace the key value (if given) in the INSERT prior to allowing the insert to proceed. It worked, but it wasn't very efficient.

vmcburney · Post by **vmcburney** » Thu Sep 29, 2005 5:34 pm

We use DB2 sequences and not DB2 identity columns. I am not sure which is better. If I had a choice I would go for ETL generated keys every time. We have issues with DB2 sequences, we have to fudge the insert SQL to make it retrieve NEXTVAL from the sequence to generate an id, it can be difficult to bulk load or import into these tables, we are constantly retrieving these keys back into jobs for lookups.

mangrick · Post by **mangrick** » Tue Oct 04, 2005 3:39 am

Ray and Vincent,

at our site we will probably have those auto-genetarted identity columns (DB2 z/OS).

I remember a project (using DS MVS cobol code generator) were we had to retrieve these generated SIDs using a DS external routine stage calling the identity_val_local() DB2 function. That was very laborious! I did'nt like it.

In another post from May 2005 both of you wrote of an "reverse hashed file lookup" mechanism.
How does this work in DS EE (AIX) against the DB2 z/OS. Is that approach documented somewhere?

Thank you very much!

ray.wurlod · Post by **ray.wurlod** » Tue Oct 04, 2005 4:05 am

Hashed files are not available in DS EE, but you could achieve the same thing with a Lookup to a Lookup File Set or anything that can populate a virtual Data Set.

mangrick · Post by **mangrick** » Tue Oct 04, 2005 9:05 am

My concern ist more how to get back the newly generated SID from the DB2 table in which i just inserted a row.

- with an explictely lookup against the (master) target table using the original / alternate key - if there is one
- with some kind of appendix to the target DB2 stage
- with an function call (like the external routine in DS MVS edition)
- ...

And yes, this should be done using as few as possible stages and as performant as possible.

Or migth it be better to have a kind of DB sequence per table (since V8 DB2 has this feature)?

ray.wurlod · Post by **ray.wurlod** » Tue Oct 04, 2005 3:58 pm

Getting it back will be just as much as a problem whether you use a sequence or an identity column (SID). Why do you think you need to do this?

If it's for a subsequent run, you can select them immediately prior (for example constrained by batch ID or run date).

vmcburney · Post by **vmcburney** » Tue Oct 04, 2005 5:48 pm

If you are talking about retrieving these keys to satisfy foreign key relationships of subsequent dependent loads then you just do a fresh database lookup to retrieve the new keys. If you want to get just the new IDs back then do as Ray suggests and filter on the process id or date fields that should be in your target table.

battaliou · Post by **battaliou** » Wed Oct 05, 2005 10:41 am

We faced a similar issue with NEXTVAL from DB2 in EE. My workaround was to have a text file called NEXTFILE and write some basic to READSEQ the value and pass it down to the job as a parameter. The job itself now has an aggregator which outputs the max value of StageVar NEXTVAL which is simply a counter:
NEXTVAL + ((@OUTROWNUM-1) * @NUMPARTITIONS) + @PARTITIONNUM + 1

Problem was when the job processed no rows, it blew the target, so at the end of my sequence I to checked that it still exists else used WRITESEQ to re establish NEXTVAL

vmcburney · Post by **vmcburney** » Wed Oct 05, 2005 5:43 pm

Okay, I was following you right up until that first fullstop. The idea of using DB2 sequences or id fields is that you don't need counters in your datastage job. Why would you be creating your own NEXTVAL value when the database is generating it?

battaliou · Post by **battaliou** » Wed Oct 12, 2005 9:48 am

The reason we had to hand code a NEXTVAL in EE for use against DB2 was that DB2 just gave us 1 single value for nextval, i.e. it didn't sequentially generate itself as expected. e.g I expect a sequence of values 1,2,3 but got 1,1,1

vmcburney · Post by **vmcburney** » Wed Oct 12, 2005 7:34 pm

Okay, I get you. We run nextval on every single insert row, so every surrogate is generated by DB2 and none are generated by the ETL job. It does give us unique values each time as the nextval command is supposed to increment the sequence.

battaliou · Post by **battaliou** » Thu Nov 03, 2005 6:20 am

Got a solution to our NEXTVAL lookup problem. It requires two DB2 enterprise edition sparse lookups with identical sql i.e.
SELECT
IBMREQD, NEXTVAL FOR DM01.CDM_ARR_ID AS NEXTVAL
FROM
sysibm.sysdummy1
WHERE
(IBMREQD = ORCHESTRATE.IBMREQD)

The column IBMREQD is alternately set to 'Y' and 'N', and the data is sorted in the lookup to force this alternation e.g. my incoming data looks something like: <0,'N'> <1,'Y'> <2,'N'> <3,'Y'> <4,'N'> etc.

The lookup returns sequential numbers for every 'Y' record, so next step is to reverse the values of IBMREQD and to lookup again.

ray.wurlod · Post by **ray.wurlod** » Thu Nov 03, 2005 2:44 pm

Will this scale to N nodes for N > 2 ?

battaliou · Post by **battaliou** » Fri Nov 04, 2005 9:54 am

Yes, we are running 4 way.

Sreedhar · Post by **Sreedhar** » Wed Apr 04, 2007 10:36 am

Hi,

We have come across the same situation as you have been....

"Got a solution to our NEXTVAL lookup problem. It requires two DB2 enterprise edition sparse lookups with identical sql i.e.
SELECT
IBMREQD, NEXTVAL FOR DM01.CDM_ARR_ID AS NEXTVAL
FROM
sysibm.sysdummy1
WHERE
(IBMREQD = ORCHESTRATE.IBMREQD)

The column IBMREQD is alternately set to 'Y' and 'N', and the data is sorted in the lookup to force this alternation e.g. my incoming data looks something like: <0,'N'> <1,'Y'> <2,'N'> <3,'Y'> <4,'N'> etc.

The lookup returns sequential numbers for every 'Y' record, so next step is to reverse the values of IBMREQD and to lookup again. "

I didn't get what you said as two DB2 stages how is your design could you please elaborate more on this.......

Thanks in advance.

DSXchange

surrogate ID generation in DS EE

DB2 nextval