SurrogateKey

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
iskapalli
Participant
Posts: 76
Joined: Wed Jul 25, 2007 12:36 am

SurrogateKey

Post by iskapalli »

Hi,

I have to generate SurrogateKey. In S2T they did not mention any thing.

1.NextSurrogateKey() function in transformer.
2. SurrogateKey stage
3.( (@INROWNUM - 1) * @NUMPARTITIONS + @PARTITIONNUM + 1)

Can you suggest me which one is best approach in above list.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

"Best" in what sense?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
iskapalli
Participant
Posts: 76
Joined: Wed Jul 25, 2007 12:36 am

Post by iskapalli »

Hi ray,

in terms of performence.

and also could you plesae explain what is difference between three approches.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Define "performance" in an ETL context.

Rows/sec is not a good metric, due to the huge number of variables, not least row size, buffering, and the like.

NextSurrogateKey() function and Surrogate Key Generator stage are functionally identical - they get the next key value from a set maintained in either a state file or a database sequence.

The expression makes use of some system variables that are automatically assigned values.

If I were to be asked my opinion I would opt for the expression being quickest to execute, because the other two have to perform at least some I/O. But in a large data set there wouldn't be a lot in it, as caching effects would probably kick in. But I have no evidence upon which to base that opinion.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Keep in mind the fact that 'the expression' as show has no 'pick up where it left off last time' logic nor the ability to handle concurrency.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

True, but those are correctness issues, not performance issues! :wink:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply