Partition,Sort,Group by in Aggregator Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ds_is_fun
Premium Member
Premium Member
Posts: 194
Joined: Fri Jan 07, 2005 12:00 pm

Partition,Sort,Group by in Aggregator Stage

Post by ds_is_fun »

Im confised about the way AGGR works. Lets assume we want to sort, group by and sum in PX with 2 CPUs.
Say, we have a file

Ename Empno Deptno Sal Mgr
John 100 10 1000 102
Smith 101 20 2000 102
Eric 102 10 3000 102
Raj 103 10 2000 101
Tom 104 30 2000 101
Dan 105 30 1000 101
Drew 106 20 2000 102

We partition by "Mgr" and choose Sort partition mode for something large just like above.
We group by "Deptno".
We also need to calculate the Sum of Sal.

So based on the above.

All rows that have Mgr = 101 go to CPU1 for processing.
rows that have Mgr = 102 go to CPU2 for processing.

Is the sum calculation down now? or is each data set then joined all over to do
group by from each processor and then summed by.
Pl. explain.
Thanks
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
The way I get it is:
you determine the hash/key upon partitioning will be made.
each partition is processed via 1 pocess on 1 node.
the only thing you might get is to concatenate the results, unless you run sequentially or with a 1 node configuration.

you will not get summed-up results for data that is in more then 1 partition, you will get it once per partition; repetitive rows might be dropped while loading if you have constraints to resukt in wrong numbers.

Bare in mind that EE is more strict but some times our mistakes in design are not caught-able by it and may cause unpredicted results.
I'm sure once you find the design problem the figures will add up.


do check table definitions,partitioning.
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Post Reply