Page 1 of 1

Aggregation problem

Posted: Fri Aug 25, 2006 10:54 am
by goriparthi
Hi,


i have a job in which data should be aggregated based on four coloumns dealer,invoice,year,month and take the sum of sales.
i have 4.5 million records and job aborts around 1 million and only gives two warnings

CopyOfCopyOfTest_MOP_RTLFDTC..Aggr_Sales: %s
Abnormal termination of stage CopyOfCopyOfTest_MOP_RTLFDTC..Transformer_79 detected

when i remove the invoice from the four coluomns the job ran fine.
also it runs fine till 600000 rows or so without any warning even with the invoice included.
thought of some data issue and did null handling but of no use.

Any Suggestions.


Thanks
Raj

Re: Aggregation problem

Posted: Fri Aug 25, 2006 10:59 am
by kris007
goriparthi wrote: Abnormal termination of stage CopyOfCopyOfTest_MOP_RTLFDTC..Transformer_79 detected
Is this your Aggregator stage or Transformer Stage? Also, are you sorting data flowing into the Aggregator stage? Are you mentioning how the data is coming to the Aggregator on the inputs tab of the Aggregator stage? I think you are running out of temp space. You might want to sort the data and tell the aggregator stage how the data is coming in and then see if it aborts again. In BTW, what are the error messages you get apart from those two warnings?

Posted: Fri Aug 25, 2006 12:06 pm
by thumsup9
I assume that you have sorted the data before aggregation

Posted: Fri Aug 25, 2006 12:10 pm
by goriparthi
thumsup9 wrote:I assume that you have sorted the data before aggregation
No, I didnt i am doing it now

Posted: Fri Aug 25, 2006 12:12 pm
by thumsup9
I hope that should fix the issue.. Let us know !

Posted: Fri Aug 25, 2006 12:12 pm
by kris007
Also make sure that you mention in the Aggregator stage, the sorting order of the input data, on the inputs tab.

Posted: Fri Aug 25, 2006 12:53 pm
by chulett
Make sure you sort the data in such a manner that supports the aggregation being done. Get it wrong and the stage will ignore your sort work and handle it itself. Also make sure you 'assert' the sort order correctly in the stage, get that wrong and the job will abort with a 'Row out of seqeunce' error.

If it all works, the Aggregator stage will no longer be a bottle-neck in your job but will instead deliver rows out whenever a group change occurs.

Posted: Fri Aug 25, 2006 1:01 pm
by goriparthi
chulett wrote:Make sure you sort the data in such a manner that supports the aggregation being done. Get it wrong and the stage will ignore your sort work and handle it itself. Also make sure you 'assert' the sort order correctly in the stage, get that wrong and the job will abort with a 'Row out of seqeunce' error.

If it all works, the Aggregator stage will no longer be a bottle-neck in your job but will instead deliver rows out whenever a group change occurs.

yes , i am doing the same thing , once it is completed i will update.

Thanks for your responses

Posted: Fri Aug 25, 2006 1:46 pm
by DSguru2B
IF your source is a database, i would suggest doing the aggregation at the database level. Will be a bit faster.
If not, then for sorting, do the sorting at the unix level, again will be a bit faster.
IMHO.
Regards,

Posted: Fri Aug 25, 2006 2:34 pm
by goriparthi
DSguru2B wrote:IF your source is a database, i would suggest doing the aggregation at the database level. Will be a bit faster.
If not, then for sorting, do the sorting at the unix level, again will be a bit faster.
IMHO.
Regards,
GUYS,


Ya I got the job working , its only becoz i didnt supply sorted data to the aggregator and yes i am changing the job to do the aggregating part in the database itself so that thre will be a lot of performance increase.


Thanks all of you for your suggestions

Posted: Fri Aug 25, 2006 2:40 pm
by chulett
Well, keep the technique in mind for those times when you can't do the aggregation in the source database - times where you are aggregating on transformed data, for instance.

Posted: Fri Aug 25, 2006 2:52 pm
by goriparthi
chulett wrote:Well, keep the technique in mind for those times when you can't do the aggregation in the source database - times where you are aggregating on transformed data, for instance.
Sure Craig,


Thanks

Raj

Posted: Thu Aug 31, 2006 6:10 am
by rwierdsm
chulett wrote:Well, keep the technique in mind for those times when you can't do the aggregation in the source database - times where you are aggregating on transformed data, for instance.
At these times, you can still use the database to do your sort - do as much work up front in the RDBMS as you can.

Rob

Posted: Thu Aug 31, 2006 7:24 am
by chulett
My advice was on how to use the Aggregator stage so it's not a bottleneck in the job and won't fall over dead working a large volume of data. My point was to keep it in mind when the work isn't possible 'up front'. There will be times when you do what you can in the database, then need to do additional transformations on the data in your job and then do your aggregation.

Sure, I could 'still use the database' by loading the data back in to my database, doing the aggregation / sort there and then extracting back out but I can also easily leverage an Aggregator there.