Aggregation problem

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
goriparthi
Charter Member
Charter Member
Posts: 57
Joined: Fri Feb 24, 2006 7:44 am

Aggregation problem

Post by goriparthi »

Hi,


i have a job in which data should be aggregated based on four coloumns dealer,invoice,year,month and take the sum of sales.
i have 4.5 million records and job aborts around 1 million and only gives two warnings

CopyOfCopyOfTest_MOP_RTLFDTC..Aggr_Sales: %s
Abnormal termination of stage CopyOfCopyOfTest_MOP_RTLFDTC..Transformer_79 detected

when i remove the invoice from the four coluomns the job ran fine.
also it runs fine till 600000 rows or so without any warning even with the invoice included.
thought of some data issue and did null handling but of no use.

Any Suggestions.


Thanks
Raj
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Re: Aggregation problem

Post by kris007 »

goriparthi wrote: Abnormal termination of stage CopyOfCopyOfTest_MOP_RTLFDTC..Transformer_79 detected
Is this your Aggregator stage or Transformer Stage? Also, are you sorting data flowing into the Aggregator stage? Are you mentioning how the data is coming to the Aggregator on the inputs tab of the Aggregator stage? I think you are running out of temp space. You might want to sort the data and tell the aggregator stage how the data is coming in and then see if it aborts again. In BTW, what are the error messages you get apart from those two warnings?
Kris

Where's the "Any" key?-Homer Simpson
thumsup9
Charter Member
Charter Member
Posts: 168
Joined: Fri Feb 18, 2005 11:29 am

Post by thumsup9 »

I assume that you have sorted the data before aggregation
goriparthi
Charter Member
Charter Member
Posts: 57
Joined: Fri Feb 24, 2006 7:44 am

Post by goriparthi »

thumsup9 wrote:I assume that you have sorted the data before aggregation
No, I didnt i am doing it now
thumsup9
Charter Member
Charter Member
Posts: 168
Joined: Fri Feb 18, 2005 11:29 am

Post by thumsup9 »

I hope that should fix the issue.. Let us know !
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Also make sure that you mention in the Aggregator stage, the sorting order of the input data, on the inputs tab.
Kris

Where's the "Any" key?-Homer Simpson
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Make sure you sort the data in such a manner that supports the aggregation being done. Get it wrong and the stage will ignore your sort work and handle it itself. Also make sure you 'assert' the sort order correctly in the stage, get that wrong and the job will abort with a 'Row out of seqeunce' error.

If it all works, the Aggregator stage will no longer be a bottle-neck in your job but will instead deliver rows out whenever a group change occurs.
-craig

"You can never have too many knives" -- Logan Nine Fingers
goriparthi
Charter Member
Charter Member
Posts: 57
Joined: Fri Feb 24, 2006 7:44 am

Post by goriparthi »

chulett wrote:Make sure you sort the data in such a manner that supports the aggregation being done. Get it wrong and the stage will ignore your sort work and handle it itself. Also make sure you 'assert' the sort order correctly in the stage, get that wrong and the job will abort with a 'Row out of seqeunce' error.

If it all works, the Aggregator stage will no longer be a bottle-neck in your job but will instead deliver rows out whenever a group change occurs.

yes , i am doing the same thing , once it is completed i will update.

Thanks for your responses
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

IF your source is a database, i would suggest doing the aggregation at the database level. Will be a bit faster.
If not, then for sorting, do the sorting at the unix level, again will be a bit faster.
IMHO.
Regards,
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
goriparthi
Charter Member
Charter Member
Posts: 57
Joined: Fri Feb 24, 2006 7:44 am

Post by goriparthi »

DSguru2B wrote:IF your source is a database, i would suggest doing the aggregation at the database level. Will be a bit faster.
If not, then for sorting, do the sorting at the unix level, again will be a bit faster.
IMHO.
Regards,
GUYS,


Ya I got the job working , its only becoz i didnt supply sorted data to the aggregator and yes i am changing the job to do the aggregating part in the database itself so that thre will be a lot of performance increase.


Thanks all of you for your suggestions
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well, keep the technique in mind for those times when you can't do the aggregation in the source database - times where you are aggregating on transformed data, for instance.
-craig

"You can never have too many knives" -- Logan Nine Fingers
goriparthi
Charter Member
Charter Member
Posts: 57
Joined: Fri Feb 24, 2006 7:44 am

Post by goriparthi »

chulett wrote:Well, keep the technique in mind for those times when you can't do the aggregation in the source database - times where you are aggregating on transformed data, for instance.
Sure Craig,


Thanks

Raj
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

chulett wrote:Well, keep the technique in mind for those times when you can't do the aggregation in the source database - times where you are aggregating on transformed data, for instance.
At these times, you can still use the database to do your sort - do as much work up front in the RDBMS as you can.

Rob
Rob Wierdsma
Toronto, Canada
bartonbishop.com
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

My advice was on how to use the Aggregator stage so it's not a bottleneck in the job and won't fall over dead working a large volume of data. My point was to keep it in mind when the work isn't possible 'up front'. There will be times when you do what you can in the database, then need to do additional transformations on the data in your job and then do your aggregation.

Sure, I could 'still use the database' by loading the data back in to my database, doing the aggregation / sort there and then extracting back out but I can also easily leverage an Aggregator there.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply