data segment (heap) size

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Ratan Babu N
Participant
Posts: 34
Joined: Tue Dec 27, 2005 12:13 am

data segment (heap) size

Post by Ratan Babu N »

Hi,
When I run a Job that loads data into a Db2stage it is aborting with the following messages.


Db2udbXXX,2: The current soft limit on the data segment (heap) size (2147483645)
is less than the hard limit (2147483647), consider increasing the heap size limit


Db2udbXXX,2: Fatal Error: Throwing exception: APT_BadAlloc: Heap allocation failed.


Under what circumstances it will show these messages?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Typically when your data segment size (as set by ulimit command) is not large enough. Get your UNIX administrator to make its default unlimited.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Daddy Doma
Premium Member
Premium Member
Posts: 62
Joined: Tue Jun 14, 2005 7:17 pm
Location: Australia
Contact:

Post by Daddy Doma »

On Unix, does this error message relate to the data or the file limits?

I have an Aggregator stage which has up to 40 million records to deal with and get the same error. Following warning messages in the log show:
  • My current heap size= 1,856,298,288 bytes in 35,701,573 blocks.
  • Followed by "Failure in operator logic" for the aggregator stage.
I checked "ulimit -a" and it shows that whilst my time(seconds) and data(kbytes) are unlimited, my file(blcoks) and coredump(blocks) are set as 2097151.

Thanx,

Zac.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

At least data and stack should be set to "unlimited" for parallel jobs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Post by opdas »

Zac,
When dealing with large data for aggregation a good way is to set the method as "sort" and sort records just before the aggregate stage.
Om Prakash


"There are things that are known, and there are things that are unknown, and in between there are doors"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Indeed. Guidelines suggest that you should use "hash table" only with fewer than about 1000 distinct values of the grouping column(s) per MB of memory.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Daddy Doma
Premium Member
Premium Member
Posts: 62
Joined: Tue Jun 14, 2005 7:17 pm
Location: Australia
Contact:

Post by Daddy Doma »

Thanks Om, Thanks Ray,

I've looked at my jobs and identified areas where I can fix this aggregation issue. I will add a Sort stage and repartition before each Aggregator and use the Sort Method.

This is a lesson learnt that I will not forget - things were fine when developing using only 100 records but when I tried to run using production volumes (up to 52 million records) I started to get some problems!

Thanx,

Zac.
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

Zac,

As Ray points out it is not the number of input records that is important. You need to consider how many results you will get. If it is approaching 1000 per MB of available memory then use a sort method for the aggregation.
Don't forget that 'available memory' is used by other stages that might be running as well as the aggregator.
Post Reply