Page 1 of 1

Aggregator Performance

Posted: Wed Sep 07, 2011 10:19 pm
by raju4u
Hi,

In the job we are giving 19 crore data to aggregator stage,it is taking 3 hrs time.here we are giving sorted data and hash partitioned data to agg and method in agg is sort method..please let me know if i can reduce the time in any other manner..

Thanks,
Rajashekar.

Re: Aggregator Performance

Posted: Wed Sep 07, 2011 11:41 pm
by SURA
How about the data volume. no of columns used to aggregation?

Find out where the time is consumed more?

Split the job may help to reduce the time.

DS User

Posted: Thu Sep 08, 2011 12:27 am
by ray.wurlod
Please advise what the grouping columns for aggregation are.

Essentially, though, you need to partition on the first only of these (unless it has very few distinct values) and sort on all of them in order, to be able use Sort as the aggregation method.

Posted: Tue Sep 13, 2011 2:53 am
by keshav0307
did you try increase number of nodes

Posted: Tue Sep 13, 2011 9:50 am
by kommven
Compare with a simple select Job Vs Aggregator in Job.
I assume the throughput from Source stage is a well to note measure in depicting overall performance of your Job.

I will also suggest dumping the data into dataset and using that as a source to compare your results and see if there is any improvement oppurtunity.