Highest Key Generated

ajith · Post by **ajith** » Wed Nov 29, 2006 12:11 am

I need to capture the highest key generated during the execution of a surrogate key generator. It would be nice if somebody can give me a suggestion on this. I have a transformer after the Surrogate key generator.

Thanks in advance,

Ajith

kumar_s · Post by **kumar_s** » Wed Nov 29, 2006 12:20 am

The key generated is based on the number of rows been passed. So either if you know the number of rows that will be process through the transformer, you can get it, else, capture it after job or by using some other stage like, sort to get the max of it.

ajith · Post by **ajith** » Wed Nov 29, 2006 9:54 am

How to get the number of rows processed by surrogate key generator stage,
I want to write this into a file in the same job.

any ideas on this?

gbusson · Post by **gbusson** » Wed Nov 29, 2006 10:24 am

capture the max value of the column with a second output of the transformer!

vmcburney · Post by **vmcburney** » Wed Nov 29, 2006 1:33 pm

It's tricky in a parallel job since each instance of the key generator will end on a different number, if you try to retrieve it in a transformer you will get a different answer for each node. Your problem is fixed in DataStage 8 which has settings for storing the surrogate key values in a file so it remembers previous values. Difficult to do in version 7.x. Almost impossible without doing some inefficient and time consuming aggregation. It is easier to write your rows to the target table and then pull back the maximum value with a SQL statement.

You might be able to write your own parallel stage to create a version of the surrogate key stage that is coming in version 8.

ajith · Post by **ajith** » Wed Nov 29, 2006 10:28 pm

vmcburney wrote:It's tricky in a parallel job since each instance of the key generator will end on a different number, if you try to retrieve it in a transformer you will get a different answer for each node. Your problem is fixed in DataStage 8 which has settings for storing the surrogate key values in a file so it remembers previous values. Difficult to do in version 7.x. Almost impossible without doing some inefficient and time consuming aggregation. It is easier to write your rows to the target table and then pull back the maximum value with a SQL statement.

You might be able to write your own parallel stage to create a version of the surrogate key stage that is coming in version 8.

I am not writing the data into a database, I am preparing the load files and I dont have access to the database. anyway, thanks for valuable inputs. Will try to get an alternate solution for this.

ray.wurlod · Post by **ray.wurlod** » Wed Nov 29, 2006 11:08 pm

Add a Copy stage after the Surrogate Key Generator stage. Take a second output from the Copy stage containing just the key column. Pass this through an Aggregator stage and form the Max. Force the Aggregator stage to operate in sequential mode. Its output will contain the highest value generated.

gbusson · Post by **gbusson** » Thu Nov 30, 2006 2:08 am

Everybody tells about the aggregator.

But as the surrogate keys are "sorted", a tail command is efficient to get the hihest value!

ray.wurlod · Post by **ray.wurlod** » Thu Nov 30, 2006 1:49 pm

Not unless you collect them onto one partition. And even then how do you now that the last one collected is the highest value in the entire Data Set?

gbusson · Post by **gbusson** » Fri Dec 01, 2006 3:04 am

When you generate a SK, unless you sort data after the generation, the last row has got the max value.

After you can merge data and sort the highest value.

The aim of the tail is to reduce volume.

wannabexpert · Post by **wannabexpert** » Sat Dec 02, 2006 1:45 pm

use a tail stage after the transformer are at the later stage to get the highest SK value.save tat into a dataset.it be useful for saving the highest values.

ray.wurlod · Post by **ray.wurlod** » Sat Dec 02, 2006 2:34 pm

gbusson wrote:When you generate a SK, unless you sort data after the generation, the last row has got the max value.

After you can merge data and sort the highest value.

The aim of the tail is to reduce volume.

That's an excellent performance tip!