Aggregation problem

goriparthi · Post by **goriparthi** » Fri Aug 25, 2006 10:54 am

Hi,

i have a job in which data should be aggregated based on four coloumns dealer,invoice,year,month and take the sum of sales.
i have 4.5 million records and job aborts around 1 million and only gives two warnings

CopyOfCopyOfTest_MOP_RTLFDTC..Aggr_Sales: %s
Abnormal termination of stage CopyOfCopyOfTest_MOP_RTLFDTC..Transformer_79 detected

when i remove the invoice from the four coluomns the job ran fine.
also it runs fine till 600000 rows or so without any warning even with the invoice included.
thought of some data issue and did null handling but of no use.

Any Suggestions.

Thanks
Raj

kris007 · Post by **kris007** » Fri Aug 25, 2006 10:59 am

goriparthi wrote: Abnormal termination of stage CopyOfCopyOfTest_MOP_RTLFDTC..Transformer_79 detected

Is this your Aggregator stage or Transformer Stage? Also, are you sorting data flowing into the Aggregator stage? Are you mentioning how the data is coming to the Aggregator on the inputs tab of the Aggregator stage? I think you are running out of temp space. You might want to sort the data and tell the aggregator stage how the data is coming in and then see if it aborts again. In BTW, what are the error messages you get apart from those two warnings?

thumsup9 · Post by **thumsup9** » Fri Aug 25, 2006 12:06 pm

I assume that you have sorted the data before aggregation

goriparthi · Post by **goriparthi** » Fri Aug 25, 2006 12:10 pm

thumsup9 wrote:I assume that you have sorted the data before aggregation

No, I didnt i am doing it now

thumsup9 · Post by **thumsup9** » Fri Aug 25, 2006 12:12 pm

I hope that should fix the issue.. Let us know !

kris007 · Post by **kris007** » Fri Aug 25, 2006 12:12 pm

Also make sure that you mention in the Aggregator stage, the sorting order of the input data, on the inputs tab.

chulett · Post by **chulett** » Fri Aug 25, 2006 12:53 pm

Make sure you sort the data in such a manner that supports the aggregation being done. Get it wrong and the stage will ignore your sort work and handle it itself. Also make sure you 'assert' the sort order correctly in the stage, get that wrong and the job will abort with a 'Row out of seqeunce' error.

If it all works, the Aggregator stage will no longer be a bottle-neck in your job but will instead deliver rows out whenever a group change occurs.

goriparthi · Post by **goriparthi** » Fri Aug 25, 2006 1:01 pm

chulett wrote:Make sure you sort the data in such a manner that supports the aggregation being done. Get it wrong and the stage will ignore your sort work and handle it itself. Also make sure you 'assert' the sort order correctly in the stage, get that wrong and the job will abort with a 'Row out of seqeunce' error.

If it all works, the Aggregator stage will no longer be a bottle-neck in your job but will instead deliver rows out whenever a group change occurs.

yes , i am doing the same thing , once it is completed i will update.

Thanks for your responses

DSguru2B · Post by **DSguru2B** » Fri Aug 25, 2006 1:46 pm

IF your source is a database, i would suggest doing the aggregation at the database level. Will be a bit faster.
If not, then for sorting, do the sorting at the unix level, again will be a bit faster.
IMHO.
Regards,

goriparthi · Post by **goriparthi** » Fri Aug 25, 2006 2:34 pm

DSguru2B wrote:IF your source is a database, i would suggest doing the aggregation at the database level. Will be a bit faster.
If not, then for sorting, do the sorting at the unix level, again will be a bit faster.
IMHO.
Regards,

GUYS,

Ya I got the job working , its only becoz i didnt supply sorted data to the aggregator and yes i am changing the job to do the aggregating part in the database itself so that thre will be a lot of performance increase.

Thanks all of you for your suggestions

chulett · Post by **chulett** » Fri Aug 25, 2006 2:40 pm

Well, keep the technique in mind for those times when you can't do the aggregation in the source database - times where you are aggregating on transformed data, for instance.

goriparthi · Post by **goriparthi** » Fri Aug 25, 2006 2:52 pm

chulett wrote:Well, keep the technique in mind for those times when you can't do the aggregation in the source database - times where you are aggregating on transformed data, for instance.

Sure Craig,

Thanks

Raj

rwierdsm · Post by **rwierdsm** » Thu Aug 31, 2006 6:10 am

chulett wrote:Well, keep the technique in mind for those times when you can't do the aggregation in the source database - times where you are aggregating on transformed data, for instance.

At these times, you can still use the database to do your sort - do as much work up front in the RDBMS as you can.

Rob

chulett · Post by **chulett** » Thu Aug 31, 2006 7:24 am

My advice was on how to use the Aggregator stage so it's not a bottleneck in the job and won't fall over dead working a large volume of data. My point was to keep it in mind when the work isn't possible 'up front'. There will be times when you do what you can in the database, then need to do additional transformations on the data in your job and then do your aggregation.

Sure, I could 'still use the database' by loading the data back in to my database, doing the aggregation / sort there and then extracting back out but I can also easily leverage an Aggregator there.

DSXchange

Aggregation problem

Aggregation problem

Re: Aggregation problem