surrogate ID generation in DS EE

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

battaliou
Participant
Posts: 155
Joined: Mon Feb 24, 2003 7:28 am
Location: London
Contact:

Post by battaliou »

It's been a long time since I rock and rolled. Actually I'm working on mainframe at the moment and we don't have access to DB2 so I cant test this solution anymore. As far as I remember, I simply switched the value of column IBMREQD between 'Y' and 'N' and did the sparse lookup again. Go for it, this is a simple solution to implement.

Code: Select all

IF IBMREQD ='Y' THEN 'N' ELSE 'Y' 
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.
SettValleyConsulting
Premium Member
Premium Member
Posts: 72
Joined: Thu Sep 04, 2003 5:01 am
Location: UK & Europe

Post by SettValleyConsulting »

Hi Battaliou, Sree

I still have access to the code Battaliou is referring to. The problem with the Sparse lookup against a db2 sequence is that the lookup is expecting to do a SELECT WHERE and so requires a key value. If you use the IBMREQD as the key, say IBMREQD = 'Y', this works fine for the first row, but when the second row passes through it seems that the lookup uses the previously-returned value, presumably from a cache. Basically it's saying 'I've already returned that key so I do not need to go back to the database'. So as you have discovered, you always get the same value, rather than the NEXTVAL you need. As the lookup and db2 stage are tightly integrated and the number of properties available to change is relatively small, we have not discovered a way of avoiding this behaviour.

Battaliou's solution was to flip between two values for IBMREQD, ie rows have alternate 'Y' and 'N' which forces the lookup to hit the database for the next row as we require. This gives the right answer, but as with Sparse lookups in general, performance against non-trivial volumes is pretty dire and this solution has not been put into production.

What we prefer to do, where feasible, is to design the process so we can generate the key as the row is inserted into the table by incorporating a call to NEXTVAL in the User SQL for the INSERT statement. Where this does not fulfil the requirements we generate surrogate keys in the usual 'Datastage' manner.

regards to both

Phil Clarke.
swades
Premium Member
Premium Member
Posts: 323
Joined: Mon Dec 04, 2006 11:52 pm

Post by swades »

we generate surrogate keys in the usual 'Datastage' manner
Are you using Surrogate Key Generator Stage ?
SettValleyConsulting
Premium Member
Premium Member
Posts: 72
Joined: Thu Sep 04, 2003 5:01 am
Location: UK & Europe

Post by SettValleyConsulting »

Are you using Surrogate Key Generator Stage ?
No, we generally use a routine developed in house. I regret I am unable to share that with you.
Post Reply