Heap Allocation Failure Resolution

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Heap Allocation Failure Resolution

Post by VCInDSX »

Hi,
I have a parallel job that reads an XML file via XML Input stage and copies the elements (according to XPaths) to a target database.
This job works fine for small sized files (< 90 MB) with 1 million records.

The real file that i would like to process is 520 MB and has about 4.4 million records. While running this job, the job aborts.
Certain instances it logs XML_Input_1,0: Caught unknown exception from runLocally(). [api\operator_rep.C:3786]
Other times i see XML_Input_1,0: Caught exception from runLocally(): APT_BadAlloc: Heap allocation failed.. [api\operator_rep.C:3775]

Upon searching various posts, the recommendation was to check the size limits for the user account and there were few posts on Windows. But I did follow the suggestions to monitor the disk usage and other parameters (Task Manager). The job fails whenever the Page File size reached a little above 4 GB. Further searches revealed a link on MS Site http://support.microsoft.com/kb/237740. I checked the DS Server box and it is set to Min 2048 and Max 4096.

I hardly have any experience in dealing with performance and resource issues so far. Is increasing the page file size (or create more page files) the only remedy or would there be another way(s) to overcome this situation?

The DS Server is on a Win 2003 Server box that has 8 CPUs and 8 GB RAM.

Your invaluable insights are greatly appreciated and thanks in advance for your time.
-V
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Based on what you've posted, it sure sounds to me like increasing the page file size would be the next thing to try. According to the linked KB article, you may be able to create one greater than 4GB if you are running SP1 with the /PAE switch - otherwise you'll need to create multiple page files.
-craig

"You can never have too many knives" -- Logan Nine Fingers
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Thanks Craig.

I checked the box and it is SP1 with PAE enabled.
I wanted to check with a few folks here before i did anything to the server. Of course, this is going to be a service request to the admins and with some details on why it is required - as it involves a reboot, i believe.

Will update the results once this gets implemented.

So, the XML Input Stage is trying to grab the entire XML into memory before it can process the nodes and pass them to the rest of the stages, correct?
-V
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes, from what I understand, the entire XML file is loaded into memory to be parsed depending on the methodology used. At least that has been my experience on the Server side of the product, unsure how a 'folder-less' PX implementation would effect things. Perhaps Ernie can shed some light on the gory details if he sees this post.

I don't have any 'PX specific' XML experience and none with DataStage on Windows (only UNIX) so relying on the details you posted - which all seem thorough and support your conclusion. Curious if anyone has actually dealt with this problem and what advice they may have...
-craig

"You can never have too many knives" -- Logan Nine Fingers
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Thank you Craig. I have submitted a request to the support group to check if they have any comments as well. I will post here when i hear something.

I wonder how the XML Output stage would behave in such cases. Even if one were to chunk out the various fragments, at some point in time the XML Output stage has to combine them in memory before spitting out the XML, correct? Unless the XML chunks are written into individual files and concatenated at the OS level
-V
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Hi Group,
After a longggg journey of support tickets, page file adjustments, testing and feedback, we have decided to take a workaround approach - a few approaches, actually.

1. Convert the input XML to a delimited text in a pre-processing stage (PERL script that parses the file in SAX model and generates a text file). This text file is then loaded using an RCP job that we already have for loading other text files.

2. Some other folks have decided to chunk the XML file using similar tricks and break the input XML into smaller XML files and then process the files.

3. If possible, we decided to check with our suppliers to see if they can provide smaller size files. But that was not a guarantted solution.

IBM confirmed that the XML Input stage uses DOM object which loads the entire file into memory and therefore will exhaust the resources. There was also mention about 32 bit architecture (this being a Windows 2003 server) and 4GB limits at OS level. As I don't have access to 64 bit systems, i could not verify this.

Based on MSDN documentation, increasing the Page file should help applications that demand more memory. We increased the page file to 12GB in addition to the physical 8GB RAM. In spite of this we were not able to parse a 300 or 400 MB file successfully in a DS job.

I am sure others might have different experience on this one. Would be great if they throw some light on this subject.

Please let me know if you need any additional details on the tests carried out.
-V
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Hi,
With Windows 32-bits OS, even you think that you have 8 GB RAM on the machine, but it just wouldn't go over the max. 3.5GB. It only sees this machine as 3 lanes freeway. Any one in house knows how to use Java to read the XML using SAX instead of DOM?
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Hi lstsaur,
Thanks for the followup. There are folks that can handle XML via Java. I can give it a shot myself with some refresher/reading.
Let me know your thoughts/suggestions.

Thanks for your time.
-V
Post Reply