Duplicate Surrogate Keys

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Raftsman
Premium Member
Premium Member
Posts: 335
Joined: Thu May 26, 2005 8:56 am
Location: Ottawa, Canada

Duplicate Surrogate Keys

Post by Raftsman »

While running a job using 2 nodes, I encountered an issue with duplicate surrogate keys. A previous collegue created all job stages using sequential processing. In order to take advantage of parallel processing, I switched all stages back to Default (parallel). I used the transformer stage (Next Surrogate()) to create the keys. For some reason, a record on each node was assigned the same surrogate. Is this a known bug or do I need to structure the function differently.

I was thinks about thowing out the transformer stage and replacing it with the surrogate key generator stage. Would this eliminate the problem.

Thanks in advance
Jim Stewart
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Surrogate Key Generator stage will, like the Transformer stage, do exactly what you tell it to do, though its defaults are more likely to be well behaved.

In a Transformer stage construct your expression using @PARTITIONNUM (plus any initial constant) as the initial value, and increment by @NUMPARTITIONS. This will necessarily yield a unique sequence of numbers.
Last edited by ray.wurlod on Tue Jan 06, 2009 3:30 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shankar_iyer
Participant
Posts: 5
Joined: Sun Jun 25, 2006 12:31 am
Location: Melbourne, Australia

Post by shankar_iyer »

I am not able to see Ray's full reply for this because of "premium content". However this can be solved by use of @PARTITIONNUM and @NUMPARTITIONS
Shankar Iyer
Business Analyst
Hewett Packard
Raftsman
Premium Member
Premium Member
Posts: 335
Joined: Thu May 26, 2005 8:56 am
Location: Ottawa, Canada

Post by Raftsman »

What is the purpose of the NextSurrogateKey function if it can't control multiple nodes. Since this is a version 8.0 function, I would of assumed it to work correctly. If I understand the internal mechanism, should it not assign unique number even though multiple node are being used. I know it will work correctly if I use one node or sequential processing. Can anyone elaborate on why this function doesn't work correctly. Is there a patch.

Thanks
Jim Stewart
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Read again that the job was first run on a single node. I'm guessing, therefore, that something "sequential" has happened, maybe in the mechanism that initializes the state file or something within the function itself.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

It works fine on multiple nodes if the state file is good. In my initial testing, I found that the mechanism to update a state file didn't seem to work. I didn't pursue what looked to me to be a bug because it was just as easy to delete and recreate the state file.

Mike
verify
Premium Member
Premium Member
Posts: 99
Joined: Sun Mar 30, 2008 8:35 am

Post by verify »

I serached for the NextSurrogateKey() function in "datastage help" and "datastage manuals", but i didn't get any information about it.
Can anyone please tell me the syntax or where can i find this function.

Any help will be appreciated..
RK Raju
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Parallel Job Developer Guide, Appendix B under Utility functions.

Mike
verify
Premium Member
Premium Member
Posts: 99
Joined: Sun Mar 30, 2008 8:35 am

Post by verify »

I am using Datastage 7.5 parallel edition.
Under parallel jo guide--> appendix B --> utility functions, only one function is present that's "GetEnvironment()".

Is it present in 8.0 edition?

Please help me out..
RK Raju
verify
Premium Member
Premium Member
Posts: 99
Joined: Sun Mar 30, 2008 8:35 am

Post by verify »

I am using Datastage 7.5 parallel edition.
Under parallel jo guide--> appendix B --> utility functions, only one function is present that's "GetEnvironment()".

Is it present in 8.0 edition?

Please help me out..
RK Raju
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Yes, we're talking about the 8x release here.

Mike
Post Reply