Aggregation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
cherry
Participant
Posts: 108
Joined: Sun Jul 10, 2005 1:35 am

Aggregation

Post by cherry »

Hi All,

Could some one help in following design of parallel job.

I have the following Columns:

Input
------
A B C D
1 878 001 004
1 999 002 003
2 789 005 004
2 996 003 007

My Output should be:

GroupBy A and Sum(B) And First(C) and First(D)

Output
--------

A B C D
1 1877 001 003
2 1785 003 004

Could some one help how do I achieve this logic in Parallel. We have first function in Server jobs and what would be the relevant of it in Parallel Jobs

Best Regards
Cherry
vkhandel
Participant
Posts: 35
Joined: Wed Oct 04, 2006 12:12 am
Location: Pune

Re: Aggregation

Post by vkhandel »

Have you tried using the "Aggregator Stage"?
You can Hash/sort in input of Aggregator Stage on field "A", and then you can specify following properties in the stage ---
Grouping key = A
Aggregation type = calculation
Column for calculation = B
Sum output column = B
Column for calculation = C
Sum output column = C
Column for calculation = D
Sum output column = D
vkhandel
Participant
Posts: 35
Joined: Wed Oct 04, 2006 12:12 am
Location: Pune

Re: Aggregation

Post by vkhandel »

Sorry for the copy/paste error ...
for C and D, you should use the following function -

Column for calculation = C
Minimum Value output column = C
Column for calculation = D
Minimum Value output column = D
Post Reply