Finding max value for a column using aggregator.
Moderators: chulett, rschirm, roy
Finding max value for a column using aggregator.
Hi
I have sort stage followed by aggregator stage. In aggregation I want to know row for max of particular column.
After execution, I am getting an abort with an error description as - "main_program: APT_StatReducer: field Name is already present in the output interface schema.".
Anyone has any idea, regarding which direction should I adopt this error?
I have sort stage followed by aggregator stage. In aggregation I want to know row for max of particular column.
After execution, I am getting an abort with an error description as - "main_program: APT_StatReducer: field Name is already present in the output interface schema.".
Anyone has any idea, regarding which direction should I adopt this error?
Thanks with regards,
videsh.
videsh.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I am not using sort / hash method in the aggregator.
Partitioning type I used is 'Same'. Previous stage employed 'Entire' partitioning.
As an input to Aggregator stage I have 2 columns as - Name / Year.
In aggregator -
Grouping is done as per - Name
In aggregations - type is 'Calculation'.
Column for calculation - 'Year'
Maximum value output column - 'Name' ... So I can get name of someone whose Year col has max value.
I am getting the same error as I mentioned already.
Thanks.
Partitioning type I used is 'Same'. Previous stage employed 'Entire' partitioning.
As an input to Aggregator stage I have 2 columns as - Name / Year.
In aggregator -
Grouping is done as per - Name
In aggregations - type is 'Calculation'.
Column for calculation - 'Year'
Maximum value output column - 'Name' ... So I can get name of someone whose Year col has max value.
I am getting the same error as I mentioned already.
Thanks.
Thanks with regards,
videsh.
videsh.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Yes you are. These are the only choices.videsh77 wrote:I am not using sort / hash method in the aggregator.
Without knowing which method you're using, it is fatuous to offer any suggestions.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi Ray
I overlooked Options category in aggregator properties.
Method = Hash.
Input of the aggregator I am having only 2 fields, which are used in the aggregation process.
Also, I have changed the column name at the o/p of aggregator stage to the other than what I have in i/p.
Thanks.
I overlooked Options category in aggregator properties.
Method = Hash.
Input of the aggregator I am having only 2 fields, which are used in the aggregation process.
Also, I have changed the column name at the o/p of aggregator stage to the other than what I have in i/p.
Thanks.
Thanks with regards,
videsh.
videsh.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The Hash aggregation method is appropriate when there aren't too many distinct grouping values - up to, say, 1000 per partition. This method does not require sorted input, so you could lose the Sort stage with no detriment.
If you are sorting for some other reason, change the aggregation method to Sort, but don't forget to assert in the Aggregator stage that the data are sorted on these particular keys.
The calculation of MAX then simply involves choosing the output column for calculation, and specifying the Max() function to apply to it.
Unfortunately, to get the value (or values - there may be more than one) from the ungrouped Name column associated with the particular row in which the maximum value of Year occurs is not readily achieved in an Aggregator stage.
Think about how you'd achieve it in SQL - you need a subquery, right? So in DataStage you're going to need some downstream processing once you've found the maximum Year value.
If you are sorting for some other reason, change the aggregation method to Sort, but don't forget to assert in the Aggregator stage that the data are sorted on these particular keys.
The calculation of MAX then simply involves choosing the output column for calculation, and specifying the Max() function to apply to it.
Unfortunately, to get the value (or values - there may be more than one) from the ungrouped Name column associated with the particular row in which the maximum value of Year occurs is not readily achieved in an Aggregator stage.
Think about how you'd achieve it in SQL - you need a subquery, right? So in DataStage you're going to need some downstream processing once you've found the maximum Year value.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.