Hi,
i have an aggragator that must aggregate 20M of records but it abort.
I suppose that can be a memory problem. If the aggregator manage until 6-7M of record work well. Over this amount don't work.
How can i resolve this problem?
i tried to use the sort plug_in but it is very slow. Can the sort plug_in aggregate the records?
thanks in advance
Mario
Aggregator - problem with memory
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
On those older versions of DS Aggregator should be renamed Aggrevator. If your source data is in a table you may get much better performance by doing the aggregation in the source database plug-in. In DataStage the sort stage will sort but not aggregate.
Run your job again and keep an eye on temp file space as the job runs. The aggregator writes a lot of data to temporary files while it aggregates the input data.
Run your job again and keep an eye on temp file space as the job runs. The aggregator writes a lot of data to temporary files while it aggregates the input data.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Actually, it's the Sort stage that uses temp files, the Aggregator works in memory without landing anything. Unless things were different back in 5.x but I don't believe so.
You can substantially reduce the amount of memory (and time) used by the Aggregator by presorting the data and asserting the sort order in the Aggregator stage by marking the appropriate fields. Then again, this advantage may be offset by the amount of time and resources it takes to sort the data in job. If the Sort stage is too slow, is there any way you can deliver the data to job sorted? Perhaps a simple sort at the UNIX level or some external sorting package you may have access to? Or can the data be created in the order required to support the aggregation?
You can substantially reduce the amount of memory (and time) used by the Aggregator by presorting the data and asserting the sort order in the Aggregator stage by marking the appropriate fields. Then again, this advantage may be offset by the amount of time and resources it takes to sort the data in job. If the Sort stage is too slow, is there any way you can deliver the data to job sorted? Perhaps a simple sort at the UNIX level or some external sorting package you may have access to? Or can the data be created in the order required to support the aggregation?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Thanks for the clarification.
I remember last time I had to aggregate a large amount of data the aggregation stage would eventually fail and I had to resort to putting the data into a staging table and aggregating in a database stage.
I remember last time I had to aggregate a large amount of data the aggregation stage would eventually fail and I had to resort to putting the data into a staging table and aggregating in a database stage.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn