Is there any limit for aggregator stage in handling rows
Every time i run the job it fails with a msg 'aggregator terminating abnormally' so is it because of the huge input volume.
i am not able to process huge input of volume of 3 lakh records. i basically split the input volume to handle this situation.
is there any other way to handle the situation.. aggregator stage works fine with input volume less than 2 lakh
is there any limit for aggregator stage to handle input if so how much?
thanks in advance,
Is there any limit for aggregator stage in handling rows
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 232
- Joined: Sat May 07, 2005 2:49 pm
- Location: USA
Hi VasanthRm,
I dont think there is any limit on the number of rows the aggregator stage can handle. However, the data you are supplying to the aggregator must be sorted and the sort order must be mentioned in the aggregator.
Could you please let me know whether you are sorting the data before you send it to the aggregator?
I dont think there is any limit on the number of rows the aggregator stage can handle. However, the data you are supplying to the aggregator must be sorted and the sort order must be mentioned in the aggregator.
Could you please let me know whether you are sorting the data before you send it to the aggregator?
Thanks,
Naveen
Naveen
Hmmm... a lakh is 10-Man which is 86A0x. Very clear...
As Clarcombe mentioned, if your data is sorted then the Aggregator stage doesn't need to load and keep each & record in memory, it only needs to compute values until the next group level change.
Using sorted input is the best and most efficient method.
The limit that the aggregator stage has is memory. You can roughly estimate that (on unsorted data) the whole data stream needs to be kept in virtual memory until the last row has been read. Check your data size and look at your ulimit value and it will probably be quite clear why your process is aborting.
As Clarcombe mentioned, if your data is sorted then the Aggregator stage doesn't need to load and keep each & record in memory, it only needs to compute values until the next group level change.
Using sorted input is the best and most efficient method.
The limit that the aggregator stage has is memory. You can roughly estimate that (on unsorted data) the whole data stream needs to be kept in virtual memory until the last row has been read. Check your data size and look at your ulimit value and it will probably be quite clear why your process is aborting.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
As noted, there is a definite limit on the number of rows the aggregator can handle. That number, of course, isn't a fixed number as the total size of the rows being aggregated is more of the issue.
The only way I've found to successfully agg millions of rows is by presorting them. Then it only needs to keep one 'sort group' in memory at a time and can push rows through when there is a change.
But don't think the Sort stage can handle millions of rows either! You generally need to fall back on command line sort options or database sorts to accomplish that task for you.
The only way I've found to successfully agg millions of rows is by presorting them. Then it only needs to keep one 'sort group' in memory at a time and can push rows through when there is a change.
But don't think the Sort stage can handle millions of rows either! You generally need to fall back on command line sort options or database sorts to accomplish that task for you.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
thanks for the info,
but is there any limit for sorter stage... it is quite logical the sorted data can be better handled by agregator
to add upon,
can i use a hash file stage if my application is to sort the data and juz eliminate the duplicates and use the first incoming one. which algorithm in the hash file stage will suite my request
any idea.......this is seems very interesting to
but is there any limit for sorter stage... it is quite logical the sorted data can be better handled by agregator
to add upon,
can i use a hash file stage if my application is to sort the data and juz eliminate the duplicates and use the first incoming one. which algorithm in the hash file stage will suite my request
any idea.......this is seems very interesting to
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You can use a UV stage (or any other database stage) to sort your data.
If the data are in a file, however, you may find it easier - and faster - to use the UNIX sort command (perhaps as a before/stage subroutine) to effect the sort. Sort by the grouping column(s).
You must then inform the Aggregator stage - on its input link - that the data are sorted by the grouping column(s).
If the data are in a file, however, you may find it easier - and faster - to use the UNIX sort command (perhaps as a before/stage subroutine) to effect the sort. Sort by the grouping column(s).
You must then inform the Aggregator stage - on its input link - that the data are sorted by the grouping column(s).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.