Page 1 of 1

RANDOM SURROGATE KEY

Posted: Fri Dec 23, 2005 2:15 am
by jayantthakore
Hi,
I am generating unique key in transformer on the basis of @NUMPARTITION and @PARTITIONNUM.

Though it is generating unique keys but the order is random.Cause more columns are passing throgh 1 node.Its a 5 node file.
I have tried RoundRobin partition also.
Nothing happening!!!!!!!

I want to generate consecutive key values.

Posted: Fri Dec 23, 2005 2:26 am
by balajisr
Hi jayantthakore

Try Surrogate key stage which will generate surrogate keys based on the number of partitions. You need to give a start value which can be hardcoded or from job parameter.

Posted: Fri Dec 23, 2005 10:44 am
by thebird
Jayant,

For generating the surrogate key, you can either use the Surrogate Key stage as Balaji said or as you have done, use the Transformer. Both have pros and cons.

If you have to use the surrogate key stage, then you will have to give the start value in the stage properties (i am not sure if this can be parameterised), which is not preferred by many.

Now the transformer method is the most preferred and used. But here there is the possibility that the generated keys might miss out a few number when @NUMPARTITION and @PARTITIONNUM are used, depending on the number of nodes in the configuration file. This cannot be avoided, as some of the nodes might have finished processsing their records, while some others might just be finishing. But the generated keys would be in a sequnce, according to the formula that you have given, except towards the end of the records where a few numbers might be missed out. If you do an order by on the Surrogate key column, then you will see this.

The surrogate key geneneration sequence that I use is -

Code: Select all

@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM  - 1)) + 1 + SeedVal
where SeedVal is the last generated key. The 1 before that is for incrementing the generated keys by 1 for the next record.

Hope this helps.

Regards,

The Bird.

Posted: Fri Dec 23, 2005 1:46 pm
by ray.wurlod
Surrogate keys are ONLY about uniqueness. There is no requirement that they form an unbroken sequence. If there are gaps, this does not detract from the fact that each is unique. Change your mindset.

Posted: Fri Dec 23, 2005 3:38 pm
by ameyvaidya
The randomness witnessed is because your target stage reads data from all(5 ?) nodes and writes the data sequentially in the order it receives it from those nodes. It does not sort the data.

As stated by Ray above, the keys will be unique. The round robin partitioning will minimize gaps but will not gaurantee an orderly write. you will still see randomness in the key order.

HTH

Posted: Fri Dec 23, 2005 10:48 pm
by balajisr

unique columns values in parallel transformer stage

Posted: Wed Feb 14, 2007 6:24 pm
by mail2hfz
Hi
I tried to generate the column in parallel transfomr stage with unique values using stage variables .
here was my defination for stage variable


1) Create a stage variable for the counter, eg. SVCounter.
2) At the Stage Properties form set the Initial Value of the Stage Variable to "@PARTITIONNUM - @NUMPARTITIONS + 1".
3) Set the derivation of the stage variable to "svCounter + @NUMPARTITIONS". You can embed this in an IF statement if it is a conditional counter.

which i got from one of members in this forum ... thanks

but still i get error saying that
" dupliate identifier"
can you guys help me out.

Posted: Wed Feb 14, 2007 6:28 pm
by ray.wurlod
You are missing initializing the surrogate key value with the result of SELECT MAX(Skey) FROM targettable
You then ADD the generated integer to this value.