Retaining First/Last Values of a Key Group

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gsbrown
Premium Member
Premium Member
Posts: 148
Joined: Mon Sep 23, 2002 1:00 pm
Location: USA

Retaining First/Last Values of a Key Group

Post by gsbrown »

In DataStage Server edition, it's very easy to process rows and retain only the first & last values of a column by grouped columns in an aggregator stage. I notice this isn't an option anymore in a parallel job aggregator stage.

How, in a parallel job, would you most efficiently go about handling this.
For example, if I have these records

Column1 Column2
A 5
A 10
A 15
A 20
B 50
B 45
B 40
B 35

I want this output with the first/last values of Column2 grouping Column1

Column1, First, Last
A,5,20
B,50,35

Thanks! I'm new to parallel so this is probably very simple for somebody here to advise me.
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

Just try minimum and maximum (for strings) !! If it is a decimal then minimum/maximum value will be selected and not the first and last record.

But if you are sure that the first/last record will have the min/max decimal value, then you can still use minimum/maximum option in aggr stage.
Kandy
_________________
Try and Try again…You will succeed atlast!!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sorry, but min != first and max != last. Wish I knew the answer, but that's not it. :(
-craig

"You can never have too many knives" -- Logan Nine Fingers
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

Yes Craig, you are right ! It goes by alphabetical order for strings.
Kandy
_________________
Try and Try again…You will succeed atlast!!
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

gsbrown, in the actual scenario, do you have only 2 fields in input?
Kandy
_________________
Try and Try again…You will succeed atlast!!
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

gsbrown, in the actual scenario, do you have only 2 fields in input?
Kandy
_________________
Try and Try again…You will succeed atlast!!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Aggregator stage itself will allow you to generate First and Last.

Moderator: please move to server forum
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Retaining First/Last Values of a Key Group

Post by chulett »

Please don't, Mr Moderator.
gsbrown wrote:In DataStage Server edition, it's very easy to process rows and retain only the first & last values of a column by grouped columns in an aggregator stage. I notice this isn't an option anymore in a parallel job aggregator stage.

How, in a parallel job, would you most efficiently go about handling this.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Job type is marked as server

In a parallel job the First/Last functionality is provided by the Remove Duplicates stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
meet_deb85
Premium Member
Premium Member
Posts: 132
Joined: Tue Sep 04, 2007 11:38 am
Location: NOIDA

Post by meet_deb85 »

I wonder why people are not coming up with a solution ......


Here is one without any aggregator stage : -



SRC ------ > Copy stage ----------------------------------->Sort Stage_1
........................|.............................................................|
........................|..............................................Remove Duplicate stage1
........................|..............................................................|
..................Sort Stage2-->Remove Duplicate Stage2-->Join Stage-->o/p

Here in the Sort stage_1 and Sort stage_2 sort on the Key column Col1 and col2 in ascending order and then use Remove duplicate stages,but make sure that in the Remove Duplicate stage1 keep duplicate to retain as first
and in Remove Duplicate stage2 keep duplicate to retain as last.

Just join the output on the basis of column1 to get he desired output
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It would have been more efficient to sort once, either during extraction or ahead of the Copy stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

The recommended option will not work if they want to do first and last for some columns and max/min for some columns. and that's why my question was

gsbrown, in the actual scenario, do you have only 2 fields in input?
Kandy
_________________
Try and Try again…You will succeed atlast!!
Post Reply