DataStage blamed again

ewartpm · Post by **ewartpm** » Wed Aug 17, 2005 7:47 am

We run DataStage jobs daily. The database (Informix) is on a SAN and the hash files and sequential files are on the AIX server. The jobs are simple seq-->trfm-->seq in most cases with one or two hash file lookups. Hash files are large, most over 1,000,000 rows and at least 20 columns wide (not good I know).

We are rebooting the server every three or four days because it just starts to swap itself to death. The DataStage jobs are being blamed for consuming all the memory on the server. (60 x 1024) blocks of memory as an average. We have been told by the dba 'When DataStage starts, it starts allocating these memory blocks until the server starts swapping because all the memory is consumed'.

We have had a look at our jobs, at log file sizes, at uvconfig settings, dslockd size etc but all seems ok. The Project performance settings are 999 for hash file read and write cache and the inter process buffer size is set at 1024. At any point in time there are probably 10 jobs running concurrently.

Informix has been allocated 4GB of RAM on the server, the rest is available for DataStage and AIX.

We have suggested doing a health check on AIX (making sure all available patches are installed) and then re-installing DataStage. Don't know if this will solve the problem.

If anyone has had similar experiences, I would like to know how you solved them.

ArndW · Post by **ArndW** » Wed Aug 17, 2005 8:12 am

ewartpm,

instead of going into great detail, I would recommend monitoring the system when it "starts paging itself to death" and stopping DataStage. If the swapping does not stop then DataStage cannot be (directly) at fault, since it is stopped and has freed up it's memory.

There are excellent tools to monitor system performance, and I am surprised that the administrators haven't used any of them - perhaps they just made some broad assumptions.

During the execution of your jobs the DataStage processes will grab memory, especially if they are doing things like sorting and aggregating. Is the "swapping to death" happening during your processing or at other times?

gpatton · Post by **gpatton** » Wed Aug 17, 2005 9:15 am

What version of AIX are you using (including fixpack)?

What type of SAN are the files on?

What is the size of your swap space on UNIX?

roy · Post by **roy** » Wed Aug 17, 2005 2:43 pm

Hi,
Let's recap...
1024k times 10 (if all 10 use the buffer only once) is 10MB
How many hash files loaded to memory do you have and how much memory do they take (potentially near 1GB limit each)
Are you using cache sharing for hash files?

I think potentially you can load the server way before you run 10 jobs, It only depends on what you have in your design.
Loading 20 hash files 500MB each or equivalent will require 10GB of memory.

Do you use prechached hash files that are always in memory?

So you probably nead to rethink your load balancing, or upgrade your resources.

If you can isolate several jobs with real heavy processing/memory loads you might schedual them to not run at the same time.

IHTH,