Page 1 of 1

Surrogate key generation problem

Posted: Sun Aug 28, 2005 8:12 am
by balajisr
I have two links namely customers,maxcustomer.

customer link outputs all customer present in sequential file named customer.txt which has 1000 rows.
maxcustomer link outputs maximum customerid present in sequential file customer.txt.

I need to generate surrogate key for the customer dimension such that surrogate key starts from (maxcustomerid + 1).

i decided to use transformer to calculate surrogate key.So, in transformer i need to have maxcustomerid.But transformer takes only one link as input so i needed to merge customer link and maxcustomerlink.

For merging two links I decided to go for lookup stage.Lookup stage outputs all columns of customers + one column containing the maxcustomerid.In Lookup stage conditions dialog i had given lookfailure to "continue".The lookup suceeds for one row and i get maxcustomerid,but for rest of rows where lookup fails i get maxcustomerid to be 0.I want the maxcustomerid to be present in all rows so that in transformer stage i can generate surrogate key as (maxcustomerid + inrownumber)

Is my approach correct or am i making any basic mistake?
Please guide me.

Regards
Balaji

Posted: Mon Aug 29, 2005 1:47 am
by elavenil
Maximum customer id file should have a key column, which can be dummy column and get maximum customer id based on this dummy column. And Dummy column should exist in the Customer input as well and the value for this dummy column should be the same in Customers input and lookup so that maximum customer id would be available for all rows.

HTWH.

Regards
Saravanan

Posted: Mon Aug 29, 2005 2:05 am
by ray.wurlod
Probably the easiest-to-understand way is to pre-store the maximum value in an earlier job (perhaps into a DataSet). Then pick up this value is a Lookup stage that precedes the Transformer stage, delivering the same value for every row processed. However, if you're going to work in the parallel environment, how will you guarantee that unique surrogate key values are generated across all processing nodes?

Check out the Surrogate Key Generator stage; it solves most of your problems, including being able to set a starting value.

Posted: Mon Aug 29, 2005 9:25 am
by balajisr
Thanks for your responses. It was very useful.

I have checked surrogate generator stage. It can accept values hardcoded or job parameter.

Values can be given to job parameter in design time or before the job run when it asks for value to be given. In both the cases i hardcode the value in the job parameter.

Is there any way i can get a value from the stage and assign to job parameter dynamically assuming the stage link outputs only one row and one column?

Regards
Balaji