Page 1 of 1

Abnormal Termination-add_to_heap() Unable to allocate memory

Posted: Mon Oct 03, 2005 9:38 am
by rsrikant
hi,

I get these two errors in my director log when i run my jobs.

1. Abnormal termination of stage. (Error)

This job has lot of hash file lookups. Around 10 hash file look ups.
The job reads from a sequential file, looks up in these 10 hash files and loads in to Oracle as well as a couple of hash files.

The same job runs on our DEV and TEST boxes. But in PROD i get this abnormal termination error and the job gets aborted.

I splitted the job in to two. 5 hash file look ups in each job. After splitting into two, the jobs complete with out aborting.

Any idea why this error comes in one box alone?



2. add_to_heap() - Unable to allocate memory (Warning)

This job reads from oracle and loads in to two hash files. Each hash file has around 15 million records.

Once it reaches around 2 miilion records it gives this warning and the job continues.

Both hash files are dynamic and write caching is enabled.

This warning comes on all three DEV / TEST / PROD boxes.

Any idea on what settings i need to change to avoid this warning. Are there any memory / kernel / shared memory settings i can work on to avoid this.


Thanks,
Srikanth

Posted: Mon Oct 03, 2005 6:35 pm
by kduke
Srikanth

What is the size of the file? Usually this is a hash file which is corrupted. This usually happens when you have a system crash or run out of disk space. If your hash file is created in the account then at TCL do

DELETE.FILE MyHashFile
or
CLEAR.FILE MyHashFile

If it is a respoitory file like DS*, RT* or something important like VOC then you have problems. If a file is corrupt then you can usually tell by counting records:

COUNT MyHashFile

Re: Abnormal Termination-add_to_heap() Unable to allocate me

Posted: Mon Oct 03, 2005 6:42 pm
by chulett
rsrikant wrote:2. add_to_heap() - Unable to allocate memory (Warning)

This job reads from oracle and loads in to two hash files. Each hash file has around 15 million records.

Once it reaches around 2 miilion records it gives this warning and the job continues.

Both hash files are dynamic and write caching is enabled.

This warning comes on all three DEV / TEST / PROD boxes.

Any idea on what settings i need to change to avoid this warning. Are there any memory / kernel / shared memory settings i can work on to avoid this.
Turn OFF write caching. As best as we can tell, this annoying "warning" shows up once the cache fills. The other option would be to bump your default write cache size for the Project up high enough to stop this message from appearing - but that change will effect all jobs.

Posted: Mon Oct 03, 2005 6:44 pm
by kduke
If your hash file is over 2GB then you need to add the 64bit option. Look at the OS level and add the sizes of DATA.30 and OVER.30. I wouild say make it 64 bit no matter what. This is a large hash file and why worry about it. If it does it on all 3 boxes then it has to be too big for 32 bit hash file.

Posted: Mon Oct 03, 2005 8:42 pm
by ray.wurlod
Or find some options to make it less than 2GB, such as loading it only with columns and rows that are actually required in the job.

Posted: Mon Oct 03, 2005 9:54 pm
by rsrikant
Hi,

Thanks for the replies.

The hash file is not that big. It has very few columns. And it is below 1 GB. The warnings come when the hash file reaches 100 MB and they keep coming once in a while until the job completes.

Craig -- How to increase the write cache at the project level?? If i turn off the write caching the performance is very slow.

Kim -- Are you talking about the abnormal termination error? If yes, then i deleted the hash files from command prompt and tried running the job. But still i get this abnormal termination in PROD box alone.

For the abnormal termination error, this job runs fine on DEV and TEST boxes. Only on PROD box i get the error. Once i split the job into two to distribute the hash lookups between these jobs, i got rid of the problem.
I wanted to know why it occurs in PROD box alone. Is it some memory settings in PROD box not allowing to have 10 hash files to be opened at a time or something similar to that?


Thanks,
Srikanth

Posted: Mon Oct 03, 2005 10:14 pm
by chulett
rsrikant wrote:Craig -- How to increase the write cache at the project level?? If i turn off the write caching the performance is very slow.
I hardly ever have write caching turned on and I can get very high speed performance from hashed file writes - provided the hashed file is properly precreated.

That being said, if you want to try bumping the write cache size - it is done via the Adminstrator from the Tunables tab of each Project, from what I recall. The effect of a change there is immediate, nothing 'extra' needs to be done.

Posted: Tue Oct 04, 2005 6:34 am
by kduke
That is really odd. You fixed it in DEV and TEST but not PROD? Did you change your Oracle client? If you did then I would suggest that you have different versions on the different machines. Maybe you have a memory leak in your Oracle client.

Posted: Tue Oct 04, 2005 9:00 am
by rsrikant
Kim - I believe the problem is with no. of hash files and not with the oracle client. Because once i split the job in to two and reduced the hash look ups the jobs are running in PROD as well. Is my understanding wrong?

Thanks craig. I found where to change the write cache limit in administrator.

Thanks,
Srikanth

Posted: Tue Oct 04, 2005 4:18 pm
by ray.wurlod
Maybe there are enough rows in production to exceed the hashed file cache size limit (default 128MB) but not in development or test environments?

Try increasing the read and write cache sizes in production, using Administrator client, Tunables tab.

Posted: Tue Oct 04, 2005 4:57 pm
by kduke
Srikanth

The cache size can only be set to what shmtest command will allow. Beyond that you get these kinds of errors. This command will tell what to set the parameters in uvconfig. You must then do a uvregen. Then you need to stop and restart DataStage.

You need to open a ticket with support on this with IBM. Should not be happening. Your DEV and TEST boxes are different machines than PROD. Therefore uvconfig will be different as well as the results of shmtest.

You are correct in spliting the file because if the file exceeds the cache limit then it gives you a warning and uses it from disk and not in memory. That is a very clever solution. If you have a natural way to split your keys then why not have 2 files in memory because it will run lots faster with 2 files in memory and 2 lookups instead of one lookup from disk.

I will look for my notes on shmtest. I am doing all this from memory and my memory is not as good as it used to be. Maybe we can get a full Wurlod.

No matter. You are on the right track. Let us know how you solve it.

If you reinstalled or upgraded DataStage then your uvconfig file got overwritten. It went back to the defaults which are not optimal.