Highest Key Generated

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ajith
Participant
Posts: 86
Joined: Thu Nov 10, 2005 11:10 pm

Highest Key Generated

Post by ajith »

I need to capture the highest key generated during the execution of a surrogate key generator. It would be nice if somebody can give me a suggestion on this. I have a transformer after the Surrogate key generator.

Thanks in advance,

Ajith
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

The key generated is based on the number of rows been passed. So either if you know the number of rows that will be process through the transformer, you can get it, else, capture it after job or by using some other stage like, sort to get the max of it.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ajith
Participant
Posts: 86
Joined: Thu Nov 10, 2005 11:10 pm

Post by ajith »

How to get the number of rows processed by surrogate key generator stage,
I want to write this into a file in the same job.

any ideas on this?
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

capture the max value of the column with a second output of the transformer!
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

It's tricky in a parallel job since each instance of the key generator will end on a different number, if you try to retrieve it in a transformer you will get a different answer for each node. Your problem is fixed in DataStage 8 which has settings for storing the surrogate key values in a file so it remembers previous values. Difficult to do in version 7.x. Almost impossible without doing some inefficient and time consuming aggregation. It is easier to write your rows to the target table and then pull back the maximum value with a SQL statement.

You might be able to write your own parallel stage to create a version of the surrogate key stage that is coming in version 8.
ajith
Participant
Posts: 86
Joined: Thu Nov 10, 2005 11:10 pm

Post by ajith »

vmcburney wrote:It's tricky in a parallel job since each instance of the key generator will end on a different number, if you try to retrieve it in a transformer you will get a different answer for each node. Your problem is fixed in DataStage 8 which has settings for storing the surrogate key values in a file so it remembers previous values. Difficult to do in version 7.x. Almost impossible without doing some inefficient and time consuming aggregation. It is easier to write your rows to the target table and then pull back the maximum value with a SQL statement.

You might be able to write your own parallel stage to create a version of the surrogate key stage that is coming in version 8.

I am not writing the data into a database, I am preparing the load files and I dont have access to the database. anyway, thanks for valuable inputs. Will try to get an alternate solution for this.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Add a Copy stage after the Surrogate Key Generator stage. Take a second output from the Copy stage containing just the key column. Pass this through an Aggregator stage and form the Max. Force the Aggregator stage to operate in sequential mode. Its output will contain the highest value generated.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

Everybody tells about the aggregator.

But as the surrogate keys are "sorted", a tail command is efficient to get the hihest value!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not unless you collect them onto one partition. And even then how do you now that the last one collected is the highest value in the entire Data Set?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

When you generate a SK, unless you sort data after the generation, the last row has got the max value.

After you can merge data and sort the highest value.

The aim of the tail is to reduce volume.
wannabexpert
Participant
Posts: 13
Joined: Mon Sep 11, 2006 8:01 am

use a tail stage

Post by wannabexpert »

use a tail stage after the transformer are at the later stage to get the highest SK value.save tat into a dataset.it be useful for saving the highest values.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

gbusson wrote:When you generate a SK, unless you sort data after the generation, the last row has got the max value.

After you can merge data and sort the highest value.

The aim of the tail is to reduce volume.
That's an excellent performance tip!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply