Hi All,
I am trying to generate surrogate key or a sequence number for one of my job. As I understand there are 3 ways to do it.
Option # 1: Use Database Sequence if your target is a table. Unfortunately, I can't use this.
Option # 2: Use Surrogate key generator or the surrogate key tab in the stages like transformer and then use a Flat File or DB sequence. If I go for the flat file approach with "In blocks of : 1" the job is performing extremely slow. Its like taking 3 extra hours for processing 200K records. I am setting the block to 1 as I need the surrogate key to be generated sequentially when I am using 4 nodes. For DB Sequence, the only provided option are DB2 and Oracle. My XMETA is on DB2 and we also have Netezza for our DW but its not listed. How do I use the DB2 database for this?
Option # 3: Use the following derivation in the transformer. But to use this I am joining/looking up my target to fetch the max number generated in the previous run so as to continue the sequence. That's slowing down the job too.
@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + NullToValue(field_name, 1)
Is there any better option other than above where the performance doesn't take a hit; surrogate key is generated sequentially on a 4-node environment; and it also continues from the point where it stopped in the previous run?
Thanks
Surrogate Key using a state file or Database
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 34
- Joined: Thu Aug 28, 2014 8:48 pm
- Location: United States
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: Surrogate Key using a state file or Database
Why?collabxchange wrote:I need the surrogate key to be generated sequentially
A surrogate key guarantees uniqueness - nothing else. A bigger block size would be my main recommendation.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 34
- Joined: Thu Aug 28, 2014 8:48 pm
- Location: United States
That is true. I guess it was a requirement coming from a database person who has used DB sequence always but I agree with you. Would you recommend a specific block size of do you recommend the system selected block size?
Also, is there a way to tell DS to just maintain it in the XMETA database by default instead of maintaining it as a state file?
The other question I have on State File is that is it locked by the job when it is currently running? Say for e.g. I have 1 state file for Table A. Now Table A can be populated via 2 different jobs that can run in parallel. Can both these jobs uses this single state file at the same time?
Also, is there a way to tell DS to just maintain it in the XMETA database by default instead of maintaining it as a state file?
The other question I have on State File is that is it locked by the job when it is currently running? Say for e.g. I have 1 state file for Table A. Now Table A can be populated via 2 different jobs that can run in parallel. Can both these jobs uses this single state file at the same time?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
System generated block size should be OK unless you have some particular reason for differing.
There is no convenient way to maintain state information in XMETA. (And that's not what XMETA is for.)
Updating the state file from two different jobs is, at the very least, dangerous. But you can read (blocks of values) from the state file using as many jobs at a time as you desire.
There is no convenient way to maintain state information in XMETA. (And that's not what XMETA is for.)
Updating the state file from two different jobs is, at the very least, dangerous. But you can read (blocks of values) from the state file using as many jobs at a time as you desire.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 34
- Joined: Thu Aug 28, 2014 8:48 pm
- Location: United States
Thank you, Ray. I will use the state file in the jobs are see how it goes.
BTW, what is your opinion about using the inbuilt DB sequence? The only options are DB2 and Oracle. Since we have DB2 for XMETA repository, I can ask for a separate database in it for generating sequence. Will it provide any performance benefits as opposed to flat file approach?
BTW, what is your opinion about using the inbuilt DB sequence? The only options are DB2 and Oracle. Since we have DB2 for XMETA repository, I can ask for a separate database in it for generating sequence. Will it provide any performance benefits as opposed to flat file approach?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
As a DataStage Developer, I always prefer to generate unique numbers directly within DataStage itself.
Any time you reach out to a database for whatever reason, it:
Any time you reach out to a database for whatever reason, it:
- - creates extra database I/O operations
- adds complexity
- slows down performance
Choose a job you love, and you will never have to work a day in your life. - Confucius