Page 1 of 1

Amount of memory that Aggregator consumes

Posted: Wed Apr 04, 2007 3:22 pm
by vnspn
Hi,

We have our DS Server on Windows and have a requirement to process a huge number of records, around 40 million.

The source is from a flat file. We have a requirement in the middle to aggregate the data. Obviously DataStage cannot have that number of records in memory. It going to abort. When I search the forum for this, the suggestion was to feed the data pre-sorted and let know the Aggregator that the data is already sorted. Hopefully Aggreagtor could sort any number of records by this way.

But if I'm going to use Sort stage to sort the data before sending it to the Aggregator, Sort stage is again going to sort it by holding all records in memory. Hopefully, this would again abort.

So, what could be a plausible approach. The questions that we have are,

1) Is there any place where I could go and modify / increase the amount of memory that Sort / Aggregator stage uses, so that it is possible to make it work for higher number of records?

2) The source is a file of size 200GB and containing 40 million records. What would be the amount of memory (RAM) that DataStage consumes for this kind of processing? If I'm going use Aggregator, does DataStage expect 200GB RAM to put all the records in memory to process it?

These are some questions thats boggling me.

Thanks.

Posted: Wed Apr 04, 2007 3:47 pm
by chulett
Worry about the sorting, not the aggregating. The Aggregator will only need enough memory to hold a single 'sort group' at a time if you've properly presorted the data and asserted the sort order in the Aggregator.

I wouldn't consider using the DataStage sort stage for this volume unless you've got no other choice. Hopefully, some sort of 'high speed' sort package would be leveraged be leveraged for this, or on a UNIX server the command line 'sort' can typically be used. Any chance of that?

Posted: Wed Apr 04, 2007 6:57 pm
by ray.wurlod
You can tune the amount of memory that the Aggregator stage uses (for a particular job) using the final option in the DS.TOOLS menu on the server. However, if you actually run out of memory, all bets are off - you can tune for 16GB but, on a system with only 4GB, this would be foolish in the extreme.

Posted: Fri Apr 06, 2007 7:16 am
by vnspn
Is the DS.TOOLS menu available on the Windows Server as well?

Posted: Fri Apr 06, 2007 7:28 am
by chulett
Sure. From a TCL prompt.