Aggregation problem
Moderators: chulett, rschirm, roy
-
- Charter Member
- Posts: 57
- Joined: Fri Feb 24, 2006 7:44 am
Aggregation problem
Hi,
i have a job in which data should be aggregated based on four coloumns dealer,invoice,year,month and take the sum of sales.
i have 4.5 million records and job aborts around 1 million and only gives two warnings
CopyOfCopyOfTest_MOP_RTLFDTC..Aggr_Sales: %s
Abnormal termination of stage CopyOfCopyOfTest_MOP_RTLFDTC..Transformer_79 detected
when i remove the invoice from the four coluomns the job ran fine.
also it runs fine till 600000 rows or so without any warning even with the invoice included.
thought of some data issue and did null handling but of no use.
Any Suggestions.
Thanks
Raj
i have a job in which data should be aggregated based on four coloumns dealer,invoice,year,month and take the sum of sales.
i have 4.5 million records and job aborts around 1 million and only gives two warnings
CopyOfCopyOfTest_MOP_RTLFDTC..Aggr_Sales: %s
Abnormal termination of stage CopyOfCopyOfTest_MOP_RTLFDTC..Transformer_79 detected
when i remove the invoice from the four coluomns the job ran fine.
also it runs fine till 600000 rows or so without any warning even with the invoice included.
thought of some data issue and did null handling but of no use.
Any Suggestions.
Thanks
Raj
Re: Aggregation problem
Is this your Aggregator stage or Transformer Stage? Also, are you sorting data flowing into the Aggregator stage? Are you mentioning how the data is coming to the Aggregator on the inputs tab of the Aggregator stage? I think you are running out of temp space. You might want to sort the data and tell the aggregator stage how the data is coming in and then see if it aborts again. In BTW, what are the error messages you get apart from those two warnings?goriparthi wrote: Abnormal termination of stage CopyOfCopyOfTest_MOP_RTLFDTC..Transformer_79 detected
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
-
- Charter Member
- Posts: 57
- Joined: Fri Feb 24, 2006 7:44 am
Make sure you sort the data in such a manner that supports the aggregation being done. Get it wrong and the stage will ignore your sort work and handle it itself. Also make sure you 'assert' the sort order correctly in the stage, get that wrong and the job will abort with a 'Row out of seqeunce' error.
If it all works, the Aggregator stage will no longer be a bottle-neck in your job but will instead deliver rows out whenever a group change occurs.
If it all works, the Aggregator stage will no longer be a bottle-neck in your job but will instead deliver rows out whenever a group change occurs.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Charter Member
- Posts: 57
- Joined: Fri Feb 24, 2006 7:44 am
chulett wrote:Make sure you sort the data in such a manner that supports the aggregation being done. Get it wrong and the stage will ignore your sort work and handle it itself. Also make sure you 'assert' the sort order correctly in the stage, get that wrong and the job will abort with a 'Row out of seqeunce' error.
If it all works, the Aggregator stage will no longer be a bottle-neck in your job but will instead deliver rows out whenever a group change occurs.
yes , i am doing the same thing , once it is completed i will update.
Thanks for your responses
IF your source is a database, i would suggest doing the aggregation at the database level. Will be a bit faster.
If not, then for sorting, do the sorting at the unix level, again will be a bit faster.
IMHO.
Regards,
If not, then for sorting, do the sorting at the unix level, again will be a bit faster.
IMHO.
Regards,
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Charter Member
- Posts: 57
- Joined: Fri Feb 24, 2006 7:44 am
GUYS,DSguru2B wrote:IF your source is a database, i would suggest doing the aggregation at the database level. Will be a bit faster.
If not, then for sorting, do the sorting at the unix level, again will be a bit faster.
IMHO.
Regards,
Ya I got the job working , its only becoz i didnt supply sorted data to the aggregator and yes i am changing the job to do the aggregating part in the database itself so that thre will be a lot of performance increase.
Thanks all of you for your suggestions
-
- Charter Member
- Posts: 57
- Joined: Fri Feb 24, 2006 7:44 am
-
- Premium Member
- Posts: 209
- Joined: Fri Jan 09, 2004 1:14 pm
- Location: Toronto, Canada
- Contact:
At these times, you can still use the database to do your sort - do as much work up front in the RDBMS as you can.chulett wrote:Well, keep the technique in mind for those times when you can't do the aggregation in the source database - times where you are aggregating on transformed data, for instance.
Rob
Rob Wierdsma
Toronto, Canada
bartonbishop.com
Toronto, Canada
bartonbishop.com
My advice was on how to use the Aggregator stage so it's not a bottleneck in the job and won't fall over dead working a large volume of data. My point was to keep it in mind when the work isn't possible 'up front'. There will be times when you do what you can in the database, then need to do additional transformations on the data in your job and then do your aggregation.
Sure, I could 'still use the database' by loading the data back in to my database, doing the aggregation / sort there and then extracting back out but I can also easily leverage an Aggregator there.
Sure, I could 'still use the database' by loading the data back in to my database, doing the aggregation / sort there and then extracting back out but I can also easily leverage an Aggregator there.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers