Page 1 of 1

Job aborts due to heap size allocation problem

Posted: Tue Aug 16, 2011 3:45 pm
by pdntsap
Hello,

The parallel job has a join stage that is joining millions of records from two data sets and the job aborts with the following error:

Join_8,2: The current soft limit on the data segment (heap) size (805306368) is less than the hard limit (2147483647), consider increasing the heap size limit
Join_8,2: Current heap size: 279,734,248 bytes in 7,574 blocks
Join_8,2: Failure during execution of operator logic.

From other similar posts, I used the ulimit command to check the space allocation on the server:

Change and report the soft limit associated with a resource
Command: ulimit -S
My output: unlimited

Change and report the hard limit associated with a resource
Command: ulimit -H
My ouptput: unlimited

All current limits are reported
Comand: ulimt -a
My output:
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 4194304
memory(kbytes) 32768
coredump(blocks) 0
nofiles(descriptors) 2000

So, it seems that soft and hard limit is unlimited on the server but still the job with the join stage fails due to heap allocation. Is the problem still due to heap/memory allocation? Any help would be greatly appreciated.
The DataStage server runs on AIX version 5.3.

Thanks.

Posted: Tue Aug 16, 2011 4:52 pm
by chulett
Those other posts should have mentioned that you run the ulimit command inside a job to get accurate values rather than running it manually at the command line. Did you?

Posted: Wed Aug 17, 2011 7:21 am
by pdntsap
Please see below for the results from running the ulimit command inside the job

Command: ulimit -S
My output: unlimited

Command: ulimit -H
My ouptput: unlimited

Comand: ulimt -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) 786432
stack(kbytes) 4194304
memory(kbytes) 32768
coredump(blocks) 2097151
nofiles(descriptors) 2000

The data(kbytes) and coredum(blocks) are different but the hard and soft limit are unlimited. Please let me know what might be causing the heap size error.

Thanks.

Posted: Wed Aug 17, 2011 9:41 am
by chulett
Perhaps this might help: viewtopic.php?p=402895

Posted: Wed Aug 17, 2011 10:31 am
by pdntsap
There are about 13 million records in one of the files and about 30,000 records in the other file and the order of the links does not seem to matter as I get the same error even after changing the link ordering. So, I guess I have a lot of duplicates in either file. I split the 13 million record file into files of about 4.5 million records each and still had the same error. Any other workaround suggestions?

Thanks.