Page 1 of 1

Highest Key Generated

Posted: Wed Nov 29, 2006 12:11 am
by ajith
I need to capture the highest key generated during the execution of a surrogate key generator. It would be nice if somebody can give me a suggestion on this. I have a transformer after the Surrogate key generator.

Thanks in advance,

Ajith

Posted: Wed Nov 29, 2006 12:20 am
by kumar_s
The key generated is based on the number of rows been passed. So either if you know the number of rows that will be process through the transformer, you can get it, else, capture it after job or by using some other stage like, sort to get the max of it.

Posted: Wed Nov 29, 2006 9:54 am
by ajith
How to get the number of rows processed by surrogate key generator stage,
I want to write this into a file in the same job.

any ideas on this?

Posted: Wed Nov 29, 2006 10:24 am
by gbusson
capture the max value of the column with a second output of the transformer!

Posted: Wed Nov 29, 2006 1:33 pm
by vmcburney
It's tricky in a parallel job since each instance of the key generator will end on a different number, if you try to retrieve it in a transformer you will get a different answer for each node. Your problem is fixed in DataStage 8 which has settings for storing the surrogate key values in a file so it remembers previous values. Difficult to do in version 7.x. Almost impossible without doing some inefficient and time consuming aggregation. It is easier to write your rows to the target table and then pull back the maximum value with a SQL statement.

You might be able to write your own parallel stage to create a version of the surrogate key stage that is coming in version 8.

Posted: Wed Nov 29, 2006 10:28 pm
by ajith
vmcburney wrote:It's tricky in a parallel job since each instance of the key generator will end on a different number, if you try to retrieve it in a transformer you will get a different answer for each node. Your problem is fixed in DataStage 8 which has settings for storing the surrogate key values in a file so it remembers previous values. Difficult to do in version 7.x. Almost impossible without doing some inefficient and time consuming aggregation. It is easier to write your rows to the target table and then pull back the maximum value with a SQL statement.

You might be able to write your own parallel stage to create a version of the surrogate key stage that is coming in version 8.

I am not writing the data into a database, I am preparing the load files and I dont have access to the database. anyway, thanks for valuable inputs. Will try to get an alternate solution for this.

Posted: Wed Nov 29, 2006 11:08 pm
by ray.wurlod
Add a Copy stage after the Surrogate Key Generator stage. Take a second output from the Copy stage containing just the key column. Pass this through an Aggregator stage and form the Max. Force the Aggregator stage to operate in sequential mode. Its output will contain the highest value generated.

Posted: Thu Nov 30, 2006 2:08 am
by gbusson
Everybody tells about the aggregator.

But as the surrogate keys are "sorted", a tail command is efficient to get the hihest value!

Posted: Thu Nov 30, 2006 1:49 pm
by ray.wurlod
Not unless you collect them onto one partition. And even then how do you now that the last one collected is the highest value in the entire Data Set?

Posted: Fri Dec 01, 2006 3:04 am
by gbusson
When you generate a SK, unless you sort data after the generation, the last row has got the max value.

After you can merge data and sort the highest value.

The aim of the tail is to reduce volume.

use a tail stage

Posted: Sat Dec 02, 2006 1:45 pm
by wannabexpert
use a tail stage after the transformer are at the later stage to get the highest SK value.save tat into a dataset.it be useful for saving the highest values.

Posted: Sat Dec 02, 2006 2:34 pm
by ray.wurlod
gbusson wrote:When you generate a SK, unless you sort data after the generation, the last row has got the max value.

After you can merge data and sort the highest value.

The aim of the tail is to reduce volume.
That's an excellent performance tip!