Amount of memory that Aggregator consumes

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vnspn
Participant
Posts: 165
Joined: Mon Feb 12, 2007 11:42 am

Amount of memory that Aggregator consumes

Post by vnspn »

Hi,

We have our DS Server on Windows and have a requirement to process a huge number of records, around 40 million.

The source is from a flat file. We have a requirement in the middle to aggregate the data. Obviously DataStage cannot have that number of records in memory. It going to abort. When I search the forum for this, the suggestion was to feed the data pre-sorted and let know the Aggregator that the data is already sorted. Hopefully Aggreagtor could sort any number of records by this way.

But if I'm going to use Sort stage to sort the data before sending it to the Aggregator, Sort stage is again going to sort it by holding all records in memory. Hopefully, this would again abort.

So, what could be a plausible approach. The questions that we have are,

1) Is there any place where I could go and modify / increase the amount of memory that Sort / Aggregator stage uses, so that it is possible to make it work for higher number of records?

2) The source is a file of size 200GB and containing 40 million records. What would be the amount of memory (RAM) that DataStage consumes for this kind of processing? If I'm going use Aggregator, does DataStage expect 200GB RAM to put all the records in memory to process it?

These are some questions thats boggling me.

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Worry about the sorting, not the aggregating. The Aggregator will only need enough memory to hold a single 'sort group' at a time if you've properly presorted the data and asserted the sort order in the Aggregator.

I wouldn't consider using the DataStage sort stage for this volume unless you've got no other choice. Hopefully, some sort of 'high speed' sort package would be leveraged be leveraged for this, or on a UNIX server the command line 'sort' can typically be used. Any chance of that?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can tune the amount of memory that the Aggregator stage uses (for a particular job) using the final option in the DS.TOOLS menu on the server. However, if you actually run out of memory, all bets are off - you can tune for 16GB but, on a system with only 4GB, this would be foolish in the extreme.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vnspn
Participant
Posts: 165
Joined: Mon Feb 12, 2007 11:42 am

Post by vnspn »

Is the DS.TOOLS menu available on the Windows Server as well?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sure. From a TCL prompt.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply