Job aborts due to heap size allocation problem

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Job aborts due to heap size allocation problem

Post by pdntsap »

Hello,

The parallel job has a join stage that is joining millions of records from two data sets and the job aborts with the following error:

Join_8,2: The current soft limit on the data segment (heap) size (805306368) is less than the hard limit (2147483647), consider increasing the heap size limit
Join_8,2: Current heap size: 279,734,248 bytes in 7,574 blocks
Join_8,2: Failure during execution of operator logic.

From other similar posts, I used the ulimit command to check the space allocation on the server:

Change and report the soft limit associated with a resource
Command: ulimit -S
My output: unlimited

Change and report the hard limit associated with a resource
Command: ulimit -H
My ouptput: unlimited

All current limits are reported
Comand: ulimt -a
My output:
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 4194304
memory(kbytes) 32768
coredump(blocks) 0
nofiles(descriptors) 2000

So, it seems that soft and hard limit is unlimited on the server but still the job with the join stage fails due to heap allocation. Is the problem still due to heap/memory allocation? Any help would be greatly appreciated.
The DataStage server runs on AIX version 5.3.

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Those other posts should have mentioned that you run the ulimit command inside a job to get accurate values rather than running it manually at the command line. Did you?
-craig

"You can never have too many knives" -- Logan Nine Fingers
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

Please see below for the results from running the ulimit command inside the job

Command: ulimit -S
My output: unlimited

Command: ulimit -H
My ouptput: unlimited

Comand: ulimt -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) 786432
stack(kbytes) 4194304
memory(kbytes) 32768
coredump(blocks) 2097151
nofiles(descriptors) 2000

The data(kbytes) and coredum(blocks) are different but the hard and soft limit are unlimited. Please let me know what might be causing the heap size error.

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Perhaps this might help: viewtopic.php?p=402895
-craig

"You can never have too many knives" -- Logan Nine Fingers
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

There are about 13 million records in one of the files and about 30,000 records in the other file and the order of the links does not seem to matter as I get the same error even after changing the link ordering. So, I guess I have a lot of duplicates in either file. I split the 13 million record file into files of about 4.5 million records each and still had the same error. Any other workaround suggestions?

Thanks.
Post Reply