Retaining First/Last Values of a Key Group

gsbrown · Post by **gsbrown** » Wed Dec 24, 2008 9:55 am

In DataStage Server edition, it's very easy to process rows and retain only the first & last values of a column by grouped columns in an aggregator stage. I notice this isn't an option anymore in a parallel job aggregator stage.

How, in a parallel job, would you most efficiently go about handling this.
For example, if I have these records

Column1 Column2
A 5
A 10
A 15
A 20
B 50
B 45
B 40
B 35

I want this output with the first/last values of Column2 grouping Column1

Column1, First, Last
A,5,20
B,50,35

Thanks! I'm new to parallel so this is probably very simple for somebody here to advise me.

kandyshandy · Post by **kandyshandy** » Wed Dec 24, 2008 10:55 am

Just try minimum and maximum (for strings) !! If it is a decimal then minimum/maximum value will be selected and not the first and last record.

But if you are sure that the first/last record will have the min/max decimal value, then you can still use minimum/maximum option in aggr stage.

chulett · Post by **chulett** » Wed Dec 24, 2008 11:01 am

Sorry, but min != first and max != last. Wish I knew the answer, but that's not it.

kandyshandy · Post by **kandyshandy** » Wed Dec 24, 2008 12:11 pm

Yes Craig, you are right ! It goes by alphabetical order for strings.

kandyshandy · Post by **kandyshandy** » Wed Dec 24, 2008 12:26 pm

gsbrown, in the actual scenario, do you have only 2 fields in input?

kandyshandy · Post by **kandyshandy** » Wed Dec 24, 2008 12:27 pm

gsbrown, in the actual scenario, do you have only 2 fields in input?

ray.wurlod · Post by **ray.wurlod** » Wed Dec 24, 2008 2:39 pm

The Aggregator stage itself will allow you to generate First and Last.

Moderator: please move to server forum

chulett · Post by **chulett** » Wed Dec 24, 2008 3:39 pm

Please don't, Mr Moderator.

gsbrown wrote:In DataStage Server edition, it's very easy to process rows and retain only the first & last values of a column by grouped columns in an aggregator stage. I notice this isn't an option anymore in a parallel job aggregator stage.

How, in a parallel job, would you most efficiently go about handling this.

ray.wurlod · Post by **ray.wurlod** » Wed Dec 24, 2008 6:22 pm

Job type is marked as server

In a parallel job the First/Last functionality is provided by the Remove Duplicates stage.

meet_deb85 · Post by **meet_deb85** » Wed Dec 24, 2008 10:37 pm

I wonder why people are not coming up with a solution ......

Here is one without any aggregator stage : -

SRC ------ > Copy stage ----------------------------------->Sort Stage_1
........................|.............................................................|
........................|..............................................Remove Duplicate stage1
........................|..............................................................|
..................Sort Stage2-->Remove Duplicate Stage2-->Join Stage-->o/p

Here in the Sort stage_1 and Sort stage_2 sort on the Key column Col1 and col2 in ascending order and then use Remove duplicate stages,but make sure that in the Remove Duplicate stage1 keep duplicate to retain as first
and in Remove Duplicate stage2 keep duplicate to retain as last.

Just join the output on the basis of column1 to get he desired output

ray.wurlod · Post by **ray.wurlod** » Wed Dec 24, 2008 10:54 pm

It would have been more efficient to sort once, either during extraction or ahead of the Copy stage.

kandyshandy · Post by **kandyshandy** » Thu Dec 25, 2008 9:14 am

The recommended option will not work if they want to do first and last for some columns and max/min for some columns. and that's why my question was

gsbrown, in the actual scenario, do you have only 2 fields in input?

DSXchange

Retaining First/Last Values of a Key Group

Retaining First/Last Values of a Key Group

Re: Retaining First/Last Values of a Key Group