Surrogate key generation problem

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Surrogate key generation problem

Post by balajisr »

I have two links namely customers,maxcustomer.

customer link outputs all customer present in sequential file named customer.txt which has 1000 rows.
maxcustomer link outputs maximum customerid present in sequential file customer.txt.

I need to generate surrogate key for the customer dimension such that surrogate key starts from (maxcustomerid + 1).

i decided to use transformer to calculate surrogate key.So, in transformer i need to have maxcustomerid.But transformer takes only one link as input so i needed to merge customer link and maxcustomerlink.

For merging two links I decided to go for lookup stage.Lookup stage outputs all columns of customers + one column containing the maxcustomerid.In Lookup stage conditions dialog i had given lookfailure to "continue".The lookup suceeds for one row and i get maxcustomerid,but for rest of rows where lookup fails i get maxcustomerid to be 0.I want the maxcustomerid to be present in all rows so that in transformer stage i can generate surrogate key as (maxcustomerid + inrownumber)

Is my approach correct or am i making any basic mistake?
Please guide me.

Regards
Balaji
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Maximum customer id file should have a key column, which can be dummy column and get maximum customer id based on this dummy column. And Dummy column should exist in the Customer input as well and the value for this dummy column should be the same in Customers input and lookup so that maximum customer id would be available for all rows.

HTWH.

Regards
Saravanan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Probably the easiest-to-understand way is to pre-store the maximum value in an earlier job (perhaps into a DataSet). Then pick up this value is a Lookup stage that precedes the Transformer stage, delivering the same value for every row processed. However, if you're going to work in the parallel environment, how will you guarantee that unique surrogate key values are generated across all processing nodes?

Check out the Surrogate Key Generator stage; it solves most of your problems, including being able to set a starting value.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

Thanks for your responses. It was very useful.

I have checked surrogate generator stage. It can accept values hardcoded or job parameter.

Values can be given to job parameter in design time or before the job run when it asks for value to be given. In both the cases i hardcode the value in the job parameter.

Is there any way i can get a value from the stage and assign to job parameter dynamically assuming the stage link outputs only one row and one column?

Regards
Balaji
Post Reply