Page 1 of 1

Aggregator performance

Posted: Wed Mar 25, 2009 11:07 pm
by vijay.barani
Hi friends,
May i have some info. on Aggregator.I have a job in which there are all together 10 stages.


3 lookups
| | |
source drs---->aggr---->trsfrmr----->trsfrmr----->target drs
| |
2 more targets


My source have app. 10 lakh records.I have given derivation for one of the columns in the agg. stage i have taken sum() function and group by 5 other columns.It is taking more than 02:30 hours.What might be the issue.

Re: Aggregator performance

Posted: Wed Mar 25, 2009 11:09 pm
by vijay.barani
The last transformer stage has 3 lookups and 3 targets,not the first

Re: Aggregator performance

Posted: Wed Mar 25, 2009 11:26 pm
by muruganr117
vijay.barani wrote:Hi friends,
May i have some info. on Aggregator.I have a job in which there are all together 10 stages.


3 lookups
| | |
source drs---->aggr---->trsfrmr----->trsfrmr----->target drs
| |
2 more targets


My source have app. 10 lakh records.I have given derivation for one of the columns in the agg. stage i have taken sum() function and group by 5 other columns.It is taking more than 02:30 hours.What might be the issue.
Is the input pre sorted based on the mentioned GROUP BY in AGG ,before being processed in your Job?

regards

Re: Aggregator performance

Posted: Wed Mar 25, 2009 11:30 pm
by vijay.barani
I have taken 6 columns directly from a table without any conditions in the source table.


Is the input pre sorted based on the mentioned GROUP BY in AGG ,before being processed in your Job?

regards[/quote]

Posted: Wed Mar 25, 2009 11:34 pm
by ray.wurlod
Encase your design in Code tags so we can understand it better.

Posted: Thu Mar 26, 2009 12:25 am
by vijay.barani
ray.wurlod wrote:Encase your design in Code tags so we can understand it better. ...

Code: Select all



                                                      3 lookups 
                                                          | | | 
source drs---->aggr---->trsfrmr----->trsfrmr----->target drs 
                                                           | | 
                                                 2 more targets 

Posted: Thu Mar 26, 2009 12:36 am
by Pagadrai
Hi,
You can try the following:

1) remove the aggregator and see if you can fetch the SUM value from
the DB itself.
2) remove 2 transformerrs and develop the job using a single transformer.
3) i dont know your lookup logic, but see if there is any possibility of combining the 3 look ups into one stage.
4) You can also check replacing the target with a file to see if that stage is causing the performance issue.

Posted: Thu Mar 26, 2009 3:25 am
by vijay.barani
Pagadrai wrote:Hi,
You can try the following:

1) remove the aggregator and see if you can fetch the SUM value from
the DB itself.
2) remove 2 transformerrs and develop the job using a single transformer.
3) i dont know your lookup logic, but see if there is any possibility of combining the 3 look ups into one stage.
4) You can also check replacing the target with a file to see if that stage is causing the performance issue.
Hi,
Thank you
I have removed Agg stage and directly taken sum from DB,But it yet it is taking same time
Yes I have run the new modified job after removing first job !!
The thre lookups are on a single Trsformer stage,That too for only one column !!
Also there is no change is i replace the Target DRS stage with simple Seq. file.

Don't know ehy it is consuming much time. :!: :?:

Posted: Thu Mar 26, 2009 3:36 am
by vijay.barani
ray.wurlod wrote:Encase your design in Code tags so we can understand it better. ...

Code: Select all


              3 lookups 
                     | | | 
source drs---->trsfrmr----->target drs 
                    
Now this is my new job design !!
But there must be some use for an AGG. Stage ??

Posted: Thu Mar 26, 2009 3:43 am
by ray.wurlod
If you can arrange to have your source data sorted by the five grouping columns, and advise the Aggregator that this is the case, your execution time for that stage will reduce by orders of magnitude.