RANDOM SURROGATE KEY

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jayantthakore
Participant
Posts: 42
Joined: Tue Jan 18, 2005 5:48 am

RANDOM SURROGATE KEY

Post by jayantthakore »

Hi,
I am generating unique key in transformer on the basis of @NUMPARTITION and @PARTITIONNUM.

Though it is generating unique keys but the order is random.Cause more columns are passing throgh 1 node.Its a 5 node file.
I have tried RoundRobin partition also.
Nothing happening!!!!!!!

I want to generate consecutive key values.
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

Hi jayantthakore

Try Surrogate key stage which will generate surrogate keys based on the number of partitions. You need to give a start value which can be hardcoded or from job parameter.
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

Jayant,

For generating the surrogate key, you can either use the Surrogate Key stage as Balaji said or as you have done, use the Transformer. Both have pros and cons.

If you have to use the surrogate key stage, then you will have to give the start value in the stage properties (i am not sure if this can be parameterised), which is not preferred by many.

Now the transformer method is the most preferred and used. But here there is the possibility that the generated keys might miss out a few number when @NUMPARTITION and @PARTITIONNUM are used, depending on the number of nodes in the configuration file. This cannot be avoided, as some of the nodes might have finished processsing their records, while some others might just be finishing. But the generated keys would be in a sequnce, according to the formula that you have given, except towards the end of the records where a few numbers might be missed out. If you do an order by on the Surrogate key column, then you will see this.

The surrogate key geneneration sequence that I use is -

Code: Select all

@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM  - 1)) + 1 + SeedVal
where SeedVal is the last generated key. The 1 before that is for incrementing the generated keys by 1 for the next record.

Hope this helps.

Regards,

The Bird.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Surrogate keys are ONLY about uniqueness. There is no requirement that they form an unbroken sequence. If there are gaps, this does not detract from the fact that each is unique. Change your mindset.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

The randomness witnessed is because your target stage reads data from all(5 ?) nodes and writes the data sequentially in the order it receives it from those nodes. It does not sort the data.

As stated by Ray above, the keys will be unique. The round robin partitioning will minimize gaps but will not gaurantee an orderly write. you will still see randomness in the key order.

HTH
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

mail2hfz
Premium Member
Premium Member
Posts: 92
Joined: Thu Nov 16, 2006 8:51 am

unique columns values in parallel transformer stage

Post by mail2hfz »

Hi
I tried to generate the column in parallel transfomr stage with unique values using stage variables .
here was my defination for stage variable


1) Create a stage variable for the counter, eg. SVCounter.
2) At the Stage Properties form set the Initial Value of the Stage Variable to "@PARTITIONNUM - @NUMPARTITIONS + 1".
3) Set the derivation of the stage variable to "svCounter + @NUMPARTITIONS". You can embed this in an IF statement if it is a conditional counter.

which i got from one of members in this forum ... thanks

but still i get error saying that
" dupliate identifier"
can you guys help me out.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You are missing initializing the surrogate key value with the result of SELECT MAX(Skey) FROM targettable
You then ADD the generated integer to this value.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply