Aggregate function 'First' - parallel Aggregator stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vnspn
Participant
Posts: 165
Joined: Mon Feb 12, 2007 11:42 am

Aggregate function 'First' - parallel Aggregator stage

Post by vnspn »

Hi,

We are in process of converting our Server jobs into Parallel jobs. There are some Server jobs that use Aggregator and that uses the Aggregate function 'First' in them.

When I convert these jobs to parallel jobs, I see there is no similar property to set, in the parallel Aggregator stage to get the value of a column for the First record in a group.

We might need to know the way to get the First record's column value in a group (when grouping using a parallel Aggregator stage)

Thanks.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

Try duplicate remove stage with first to retain option on keys.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
bkumar103
Participant
Posts: 214
Joined: Wed Jul 25, 2007 2:29 am
Location: Chennai

Post by bkumar103 »

You can use the remove duplicate stage if the data is already sorted. There you can specify that based on key column whether you want to keep first or last column.
If data is not sorted then, before using the remove duplicate stage, sort stage should used that will sort the data based on the key value.
vnspn
Participant
Posts: 165
Joined: Mon Feb 12, 2007 11:42 am

Post by vnspn »

Thanks for your suggestions. But I missed one point to tell. We would need to do 'Sum' and 'Max' on certain columns. That was the reason we use Aggregator here.

In a Server job's aggregator, I could group by on certain columns, then get the 'Sum' and 'Max' for the columns we want and then for the rest of the columns, we use the 'First' to get the values.

But, in parallel job's Aggregator, we have properties to get 'Sum' and 'Max', but we are stranded for rest of the columns where we need to just pass on the 'First' value.

Any thoughts...
Post Reply