surrogate ID generation in DS EE

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try searching the forum and on-line help. For example you will learn that there is a Surrogate Key Generator stage in DS EE. There are many other techniques within DS, which your search will uncover.

It mainly boils down to two questions for me. One is whether other agencies are likely to be updating the table at the same time. In that case always generate keys in the database, and ensure that both DS and the other agencies follow the same mechanism (to avoid duplicate keys). The other is the efficiency of the database mechanism. Recently I visited a site where they used a before-insert trigger to get the next value from an Oracle sequence to replace the key value (if given) in the INSERT prior to allowing the insert to proceed. It worked, but it wasn't very efficient.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

We use DB2 sequences and not DB2 identity columns. I am not sure which is better. If I had a choice I would go for ETL generated keys every time. We have issues with DB2 sequences, we have to fudge the insert SQL to make it retrieve NEXTVAL from the sequence to generate an id, it can be difficult to bulk load or import into these tables, we are constantly retrieving these keys back into jobs for lookups.
mangrick
Participant
Posts: 10
Joined: Fri May 28, 2004 6:09 am
Location: München

Post by mangrick »

Ray and Vincent,

at our site we will probably have those auto-genetarted identity columns (DB2 z/OS).

I remember a project (using DS MVS cobol code generator) were we had to retrieve these generated SIDs using a DS external routine stage calling the identity_val_local() DB2 function. That was very laborious! I did'nt like it.

In another post from May 2005 both of you wrote of an "reverse hashed file lookup" mechanism.
How does this work in DS EE (AIX) against the DB2 z/OS. Is that approach documented somewhere?

Thank you very much!
Regards,
Mathias Angrick
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Hashed files are not available in DS EE, but you could achieve the same thing with a Lookup to a Lookup File Set or anything that can populate a virtual Data Set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mangrick
Participant
Posts: 10
Joined: Fri May 28, 2004 6:09 am
Location: München

Post by mangrick »

My concern ist more how to get back the newly generated SID from the DB2 table in which i just inserted a row.

- with an explictely lookup against the (master) target table using the original / alternate key - if there is one
- with some kind of appendix to the target DB2 stage
- with an function call (like the external routine in DS MVS edition)
- ... :?:

And yes, this should be done using as few as possible stages and as performant as possible.


Or migth it be better to have a kind of DB sequence per table (since V8 DB2 has this feature)?
Regards,
Mathias Angrick
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Getting it back will be just as much as a problem whether you use a sequence or an identity column (SID). Why do you think you need to do this?

If it's for a subsequent run, you can select them immediately prior (for example constrained by batch ID or run date).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

If you are talking about retrieving these keys to satisfy foreign key relationships of subsequent dependent loads then you just do a fresh database lookup to retrieve the new keys. If you want to get just the new IDs back then do as Ray suggests and filter on the process id or date fields that should be in your target table.
battaliou
Participant
Posts: 155
Joined: Mon Feb 24, 2003 7:28 am
Location: London
Contact:

DB2 nextval

Post by battaliou »

We faced a similar issue with NEXTVAL from DB2 in EE. My workaround was to have a text file called NEXTFILE and write some basic to READSEQ the value and pass it down to the job as a parameter. The job itself now has an aggregator which outputs the max value of StageVar NEXTVAL which is simply a counter:
NEXTVAL + ((@OUTROWNUM-1) * @NUMPARTITIONS) + @PARTITIONNUM + 1

Problem was when the job processed no rows, it blew the target, so at the end of my sequence I to checked that it still exists else used WRITESEQ to re establish NEXTVAL
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Okay, I was following you right up until that first fullstop. The idea of using DB2 sequences or id fields is that you don't need counters in your datastage job. Why would you be creating your own NEXTVAL value when the database is generating it?
battaliou
Participant
Posts: 155
Joined: Mon Feb 24, 2003 7:28 am
Location: London
Contact:

Post by battaliou »

The reason we had to hand code a NEXTVAL in EE for use against DB2 was that DB2 just gave us 1 single value for nextval, i.e. it didn't sequentially generate itself as expected. e.g I expect a sequence of values 1,2,3 but got 1,1,1
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Okay, I get you. We run nextval on every single insert row, so every surrogate is generated by DB2 and none are generated by the ETL job. It does give us unique values each time as the nextval command is supposed to increment the sequence.
battaliou
Participant
Posts: 155
Joined: Mon Feb 24, 2003 7:28 am
Location: London
Contact:

Post by battaliou »

Got a solution to our NEXTVAL lookup problem. It requires two DB2 enterprise edition sparse lookups with identical sql i.e.
SELECT
IBMREQD, NEXTVAL FOR DM01.CDM_ARR_ID AS NEXTVAL
FROM
sysibm.sysdummy1
WHERE
(IBMREQD = ORCHESTRATE.IBMREQD)

The column IBMREQD is alternately set to 'Y' and 'N', and the data is sorted in the lookup to force this alternation e.g. my incoming data looks something like: <0,'N'> <1,'Y'> <2,'N'> <3,'Y'> <4,'N'> etc.

The lookup returns sequential numbers for every 'Y' record, so next step is to reverse the values of IBMREQD and to lookup again.
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Will this scale to N nodes for N > 2 ?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
battaliou
Participant
Posts: 155
Joined: Mon Feb 24, 2003 7:28 am
Location: London
Contact:

Post by battaliou »

Yes, we are running 4 way.
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.
Sreedhar
Participant
Posts: 187
Joined: Mon Oct 30, 2006 12:16 am

Post by Sreedhar »

Hi,

We have come across the same situation as you have been....

"Got a solution to our NEXTVAL lookup problem. It requires two DB2 enterprise edition sparse lookups with identical sql i.e.
SELECT
IBMREQD, NEXTVAL FOR DM01.CDM_ARR_ID AS NEXTVAL
FROM
sysibm.sysdummy1
WHERE
(IBMREQD = ORCHESTRATE.IBMREQD)

The column IBMREQD is alternately set to 'Y' and 'N', and the data is sorted in the lookup to force this alternation e.g. my incoming data looks something like: <0,'N'> <1,'Y'> <2,'N'> <3,'Y'> <4,'N'> etc.

The lookup returns sequential numbers for every 'Y' record, so next step is to reverse the values of IBMREQD and to lookup again. "



I didn't get what you said as two DB2 stages how is your design could you please elaborate more on this.......

Thanks in advance.
Regards,
Shree
785-816-0728
Post Reply