Amount of memory that Aggregator consumes
Posted: Wed Apr 04, 2007 3:22 pm
Hi,
We have our DS Server on Windows and have a requirement to process a huge number of records, around 40 million.
The source is from a flat file. We have a requirement in the middle to aggregate the data. Obviously DataStage cannot have that number of records in memory. It going to abort. When I search the forum for this, the suggestion was to feed the data pre-sorted and let know the Aggregator that the data is already sorted. Hopefully Aggreagtor could sort any number of records by this way.
But if I'm going to use Sort stage to sort the data before sending it to the Aggregator, Sort stage is again going to sort it by holding all records in memory. Hopefully, this would again abort.
So, what could be a plausible approach. The questions that we have are,
1) Is there any place where I could go and modify / increase the amount of memory that Sort / Aggregator stage uses, so that it is possible to make it work for higher number of records?
2) The source is a file of size 200GB and containing 40 million records. What would be the amount of memory (RAM) that DataStage consumes for this kind of processing? If I'm going use Aggregator, does DataStage expect 200GB RAM to put all the records in memory to process it?
These are some questions thats boggling me.
Thanks.
We have our DS Server on Windows and have a requirement to process a huge number of records, around 40 million.
The source is from a flat file. We have a requirement in the middle to aggregate the data. Obviously DataStage cannot have that number of records in memory. It going to abort. When I search the forum for this, the suggestion was to feed the data pre-sorted and let know the Aggregator that the data is already sorted. Hopefully Aggreagtor could sort any number of records by this way.
But if I'm going to use Sort stage to sort the data before sending it to the Aggregator, Sort stage is again going to sort it by holding all records in memory. Hopefully, this would again abort.
So, what could be a plausible approach. The questions that we have are,
1) Is there any place where I could go and modify / increase the amount of memory that Sort / Aggregator stage uses, so that it is possible to make it work for higher number of records?
2) The source is a file of size 200GB and containing 40 million records. What would be the amount of memory (RAM) that DataStage consumes for this kind of processing? If I'm going use Aggregator, does DataStage expect 200GB RAM to put all the records in memory to process it?
These are some questions thats boggling me.
Thanks.