surrogate ID generation in DS EE
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Try searching the forum and on-line help. For example you will learn that there is a Surrogate Key Generator stage in DS EE. There are many other techniques within DS, which your search will uncover.
It mainly boils down to two questions for me. One is whether other agencies are likely to be updating the table at the same time. In that case always generate keys in the database, and ensure that both DS and the other agencies follow the same mechanism (to avoid duplicate keys). The other is the efficiency of the database mechanism. Recently I visited a site where they used a before-insert trigger to get the next value from an Oracle sequence to replace the key value (if given) in the INSERT prior to allowing the insert to proceed. It worked, but it wasn't very efficient.
It mainly boils down to two questions for me. One is whether other agencies are likely to be updating the table at the same time. In that case always generate keys in the database, and ensure that both DS and the other agencies follow the same mechanism (to avoid duplicate keys). The other is the efficiency of the database mechanism. Recently I visited a site where they used a before-insert trigger to get the next value from an Oracle sequence to replace the key value (if given) in the INSERT prior to allowing the insert to proceed. It worked, but it wasn't very efficient.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
We use DB2 sequences and not DB2 identity columns. I am not sure which is better. If I had a choice I would go for ETL generated keys every time. We have issues with DB2 sequences, we have to fudge the insert SQL to make it retrieve NEXTVAL from the sequence to generate an id, it can be difficult to bulk load or import into these tables, we are constantly retrieving these keys back into jobs for lookups.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Ray and Vincent,
at our site we will probably have those auto-genetarted identity columns (DB2 z/OS).
I remember a project (using DS MVS cobol code generator) were we had to retrieve these generated SIDs using a DS external routine stage calling the identity_val_local() DB2 function. That was very laborious! I did'nt like it.
In another post from May 2005 both of you wrote of an "reverse hashed file lookup" mechanism.
How does this work in DS EE (AIX) against the DB2 z/OS. Is that approach documented somewhere?
Thank you very much!
at our site we will probably have those auto-genetarted identity columns (DB2 z/OS).
I remember a project (using DS MVS cobol code generator) were we had to retrieve these generated SIDs using a DS external routine stage calling the identity_val_local() DB2 function. That was very laborious! I did'nt like it.
In another post from May 2005 both of you wrote of an "reverse hashed file lookup" mechanism.
How does this work in DS EE (AIX) against the DB2 z/OS. Is that approach documented somewhere?
Thank you very much!
Regards,
Mathias Angrick
Mathias Angrick
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hashed files are not available in DS EE, but you could achieve the same thing with a Lookup to a Lookup File Set or anything that can populate a virtual Data Set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
My concern ist more how to get back the newly generated SID from the DB2 table in which i just inserted a row.
- with an explictely lookup against the (master) target table using the original / alternate key - if there is one
- with some kind of appendix to the target DB2 stage
- with an function call (like the external routine in DS MVS edition)
- ...
And yes, this should be done using as few as possible stages and as performant as possible.
Or migth it be better to have a kind of DB sequence per table (since V8 DB2 has this feature)?
- with an explictely lookup against the (master) target table using the original / alternate key - if there is one
- with some kind of appendix to the target DB2 stage
- with an function call (like the external routine in DS MVS edition)
- ...
And yes, this should be done using as few as possible stages and as performant as possible.
Or migth it be better to have a kind of DB sequence per table (since V8 DB2 has this feature)?
Regards,
Mathias Angrick
Mathias Angrick
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Getting it back will be just as much as a problem whether you use a sequence or an identity column (SID). Why do you think you need to do this?
If it's for a subsequent run, you can select them immediately prior (for example constrained by batch ID or run date).
If it's for a subsequent run, you can select them immediately prior (for example constrained by batch ID or run date).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
If you are talking about retrieving these keys to satisfy foreign key relationships of subsequent dependent loads then you just do a fresh database lookup to retrieve the new keys. If you want to get just the new IDs back then do as Ray suggests and filter on the process id or date fields that should be in your target table.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
DB2 nextval
We faced a similar issue with NEXTVAL from DB2 in EE. My workaround was to have a text file called NEXTFILE and write some basic to READSEQ the value and pass it down to the job as a parameter. The job itself now has an aggregator which outputs the max value of StageVar NEXTVAL which is simply a counter:
NEXTVAL + ((@OUTROWNUM-1) * @NUMPARTITIONS) + @PARTITIONNUM + 1
Problem was when the job processed no rows, it blew the target, so at the end of my sequence I to checked that it still exists else used WRITESEQ to re establish NEXTVAL
NEXTVAL + ((@OUTROWNUM-1) * @NUMPARTITIONS) + @PARTITIONNUM + 1
Problem was when the job processed no rows, it blew the target, so at the end of my sequence I to checked that it still exists else used WRITESEQ to re establish NEXTVAL
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Okay, I was following you right up until that first fullstop. The idea of using DB2 sequences or id fields is that you don't need counters in your datastage job. Why would you be creating your own NEXTVAL value when the database is generating it?
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
The reason we had to hand code a NEXTVAL in EE for use against DB2 was that DB2 just gave us 1 single value for nextval, i.e. it didn't sequentially generate itself as expected. e.g I expect a sequence of values 1,2,3 but got 1,1,1
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Okay, I get you. We run nextval on every single insert row, so every surrogate is generated by DB2 and none are generated by the ETL job. It does give us unique values each time as the nextval command is supposed to increment the sequence.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Got a solution to our NEXTVAL lookup problem. It requires two DB2 enterprise edition sparse lookups with identical sql i.e.
SELECT
IBMREQD, NEXTVAL FOR DM01.CDM_ARR_ID AS NEXTVAL
FROM
sysibm.sysdummy1
WHERE
(IBMREQD = ORCHESTRATE.IBMREQD)
The column IBMREQD is alternately set to 'Y' and 'N', and the data is sorted in the lookup to force this alternation e.g. my incoming data looks something like: <0,'N'> <1,'Y'> <2,'N'> <3,'Y'> <4,'N'> etc.
The lookup returns sequential numbers for every 'Y' record, so next step is to reverse the values of IBMREQD and to lookup again.
SELECT
IBMREQD, NEXTVAL FOR DM01.CDM_ARR_ID AS NEXTVAL
FROM
sysibm.sysdummy1
WHERE
(IBMREQD = ORCHESTRATE.IBMREQD)
The column IBMREQD is alternately set to 'Y' and 'N', and the data is sorted in the lookup to force this alternation e.g. my incoming data looks something like: <0,'N'> <1,'Y'> <2,'N'> <3,'Y'> <4,'N'> etc.
The lookup returns sequential numbers for every 'Y' record, so next step is to reverse the values of IBMREQD and to lookup again.
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hi,
We have come across the same situation as you have been....
"Got a solution to our NEXTVAL lookup problem. It requires two DB2 enterprise edition sparse lookups with identical sql i.e.
SELECT
IBMREQD, NEXTVAL FOR DM01.CDM_ARR_ID AS NEXTVAL
FROM
sysibm.sysdummy1
WHERE
(IBMREQD = ORCHESTRATE.IBMREQD)
The column IBMREQD is alternately set to 'Y' and 'N', and the data is sorted in the lookup to force this alternation e.g. my incoming data looks something like: <0,'N'> <1,'Y'> <2,'N'> <3,'Y'> <4,'N'> etc.
The lookup returns sequential numbers for every 'Y' record, so next step is to reverse the values of IBMREQD and to lookup again. "
I didn't get what you said as two DB2 stages how is your design could you please elaborate more on this.......
Thanks in advance.
We have come across the same situation as you have been....
"Got a solution to our NEXTVAL lookup problem. It requires two DB2 enterprise edition sparse lookups with identical sql i.e.
SELECT
IBMREQD, NEXTVAL FOR DM01.CDM_ARR_ID AS NEXTVAL
FROM
sysibm.sysdummy1
WHERE
(IBMREQD = ORCHESTRATE.IBMREQD)
The column IBMREQD is alternately set to 'Y' and 'N', and the data is sorted in the lookup to force this alternation e.g. my incoming data looks something like: <0,'N'> <1,'Y'> <2,'N'> <3,'Y'> <4,'N'> etc.
The lookup returns sequential numbers for every 'Y' record, so next step is to reverse the values of IBMREQD and to lookup again. "
I didn't get what you said as two DB2 stages how is your design could you please elaborate more on this.......
Thanks in advance.
Regards,
Shree
785-816-0728
Shree
785-816-0728