Page 1 of 1

SurrogateKey

Posted: Tue Apr 03, 2012 8:21 am
by iskapalli
Hi,

I have to generate SurrogateKey. In S2T they did not mention any thing.

1.NextSurrogateKey() function in transformer.
2. SurrogateKey stage
3.( (@INROWNUM - 1) * @NUMPARTITIONS + @PARTITIONNUM + 1)

Can you suggest me which one is best approach in above list.

Posted: Tue Apr 03, 2012 5:09 pm
by ray.wurlod
"Best" in what sense?

Posted: Tue Apr 03, 2012 8:14 pm
by iskapalli
Hi ray,

in terms of performence.

and also could you plesae explain what is difference between three approches.

Posted: Tue Apr 03, 2012 9:34 pm
by ray.wurlod
Define "performance" in an ETL context.

Rows/sec is not a good metric, due to the huge number of variables, not least row size, buffering, and the like.

NextSurrogateKey() function and Surrogate Key Generator stage are functionally identical - they get the next key value from a set maintained in either a state file or a database sequence.

The expression makes use of some system variables that are automatically assigned values.

If I were to be asked my opinion I would opt for the expression being quickest to execute, because the other two have to perform at least some I/O. But in a large data set there wouldn't be a lot in it, as caching effects would probably kick in. But I have no evidence upon which to base that opinion.

Posted: Wed Apr 04, 2012 8:22 am
by chulett
Keep in mind the fact that 'the expression' as show has no 'pick up where it left off last time' logic nor the ability to handle concurrency.

Posted: Wed Apr 04, 2012 3:19 pm
by ray.wurlod
True, but those are correctness issues, not performance issues! :wink: