Is there any limit for aggregator stage in handling rows

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
VasanthRM
Participant
Posts: 37
Joined: Wed May 11, 2005 3:05 am

Is there any limit for aggregator stage in handling rows

Post by VasanthRM »

Is there any limit for aggregator stage in handling rows

Every time i run the job it fails with a msg 'aggregator terminating abnormally' so is it because of the huge input volume.

i am not able to process huge input of volume of 3 lakh records. i basically split the input volume to handle this situation.

is there any other way to handle the situation.. aggregator stage works fine with input volume less than 2 lakh

is there any limit for aggregator stage to handle input if so how much?

thanks in advance,
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

Is your data sorted. Aggregator works MUCH better on sorted data
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

Hi VasanthRm,

I dont think there is any limit on the number of rows the aggregator stage can handle. However, the data you are supplying to the aggregator must be sorted and the sort order must be mentioned in the aggregator.

Could you please let me know whether you are sorting the data before you send it to the aggregator?
Thanks,
Naveen
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Hmmm... a lakh is 10-Man which is 86A0x. Very clear...

As Clarcombe mentioned, if your data is sorted then the Aggregator stage doesn't need to load and keep each & record in memory, it only needs to compute values until the next group level change.

Using sorted input is the best and most efficient method.

The limit that the aggregator stage has is memory. You can roughly estimate that (on unsorted data) the whole data stream needs to be kept in virtual memory until the last row has been read. Check your data size and look at your ulimit value and it will probably be quite clear why your process is aborting.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted, there is a definite limit on the number of rows the aggregator can handle. That number, of course, isn't a fixed number as the total size of the rows being aggregated is more of the issue.

The only way I've found to successfully agg millions of rows is by presorting them. Then it only needs to keep one 'sort group' in memory at a time and can push rows through when there is a change.

But don't think the Sort stage can handle millions of rows either! You generally need to fall back on command line sort options or database sorts to accomplish that task for you.
-craig

"You can never have too many knives" -- Logan Nine Fingers
VasanthRM
Participant
Posts: 37
Joined: Wed May 11, 2005 3:05 am

Post by VasanthRM »

thanks for the info,

but is there any limit for sorter stage... it is quite logical the sorted data can be better handled by agregator

to add upon,

can i use a hash file stage if my application is to sort the data and juz eliminate the duplicates and use the first incoming one. which algorithm in the hash file stage will suite my request

any idea.......this is seems very interesting to
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can use a UV stage (or any other database stage) to sort your data.

If the data are in a file, however, you may find it easier - and faster - to use the UNIX sort command (perhaps as a before/stage subroutine) to effect the sort. Sort by the grouping column(s).

You must then inform the Aggregator stage - on its input link - that the data are sorted by the grouping column(s).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:You must then inform the Aggregator stage - on its input link - that the data are sorted by the grouping column(s).
And don't even think about lying to it - the stage will bust you for that. Big time. :lol:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply