Sun Solaris & Datastage 7.5 Problems

ArndW · Post by **ArndW** » Fri Dec 28, 2007 5:09 am

Programs do not really allocate swap space; this is done by the system when pages no longer fit in physical memory.

What could be happening is that you are starting up a number of processes, each of which is using memory space for its programs and data.

Have you added APT_DUMP_SCORE to your jobs and seen how many pids are started up with an 8-node configuration?

vmstat is a good tool to use in order to get a quick snapshot of overall system loads. But you should be monitoring your DataStage processes and the amount of memory they are using and there are better tools to use.

Do your jobs do extensive sorting or repartitioning or non-sparse lookups? Particularly the last item can cause large amounts of memory to be used even when only a couple of actual data rows are processed.

thamark · Post by **thamark** » Fri Dec 28, 2007 9:19 am

Hi Arnd,

Thanks for more info in this issue.

Here is some information we gathered during this process.

Sun solaris

It runs 752 processes on 8 nodes.

AIX

It runs 733 processes on 8 nodes.

I thought SWAP space utilization is directly propotional to the no of process gets exeuted in a job, so i created a job, which contains only rowgenerator and peek stage(but many of them) to execute same and job ran sucessfully in both environment.

Sun Solaris

It runs 1296 processes on 8 nodes.

AIX

It runs 1296 processes on 8 nodes.

After seeing this i am not convinced that swap space is allocated by the OS, otherwise this job also should have failed, since no of process is more than thousand.

I do have lot of sorting and joins, but the problem here is that one job alone takes 60GB swap space irrespective of no of rows it handles(i ran the same job 100 records and i ran the same for 100000 and result of swap utilization is same).

I am not sure this is completely issue with OS(Sun Solaris) and test i have done doesn't prove that as well.

Please let me know what are all the other tools i can use and test the same to pinpoint this issue exactly where it is.

I am wondering to know that not many people have this issue, who is running on Sun Solaris OS.

ArndW · Post by **ArndW** » Fri Dec 28, 2007 9:27 am

Your job is a complex one to use that many processes. I hope that your hardware is sufficiently beefy to support that type of job and that level of parallelism.

Look back at your job that is using 60Gb of swap space regardless of the number of rows processed. I am certain that you are loading a lookup to memory that is taking up most of this space. If you change your APT_CONFIG file to one with only 4 or even 1 node, how is your swap space affected?

thamark · Post by **thamark** » Fri Dec 28, 2007 9:39 am

I was having this doubt that lookup might have taken all the space available, so i replaced all lookup with join stage and result is the same.

Yes job runs fine, if i run it using 4 node and 2 node.

Space utilization for 4 Node config is taking 60GB space

and 8 node fails and we have 130 GB space

The problem here is this swap utilization restrict us from running multiple jobs(even if it is not complex one) at the same time eventhough we know data we are dealing is less, which is hard to estimate how much hardware config needed to run jobs.

The same job is running fine in AIX environment which is having only 12 GB swap space, which is not explainable to the client.

we have 4 CPUs and dual processor.

ArndW · Post by **ArndW** » Fri Dec 28, 2007 9:50 am

Actually, replacing your joins with sparse lookups would show memory usage a lot better if you could try that - with a sparse lookup the reference data is not loaded to memory at all.

thamark · Post by **thamark** » Thu Jan 03, 2008 12:13 pm

I am sorry to answer this so late.

I think my point here is that Datastage with Sun Solaris always takes so much space compare to Datastage with AIX combination, which is not expected and I think all these lookup and join are not candidate for sparse lookup, since input has huge no of records.

I would be happy to know answer for following questions.

Why does Datastage with Sun Solaris takes such a huge SWAP space when it is running the job?

I am hoping to see some more people who have faced this issue already, if they have same environment.

ArndW · Post by **ArndW** » Thu Jan 03, 2008 12:25 pm

This particular problem should be submitted to your support provider. I have worked on Solaris installations and never noticed swap or allocated space issues; but that was prior to 7.5.

thamark · Post by **thamark** » Thu Jan 03, 2008 12:44 pm

We already raised this issue with Support team and they says it is SUN Solaris issue, so to know for sure it would be nice to hear from some more people regarding the same(who have experienced the same).

ray.wurlod · Post by **ray.wurlod** » Thu Jan 03, 2008 3:38 pm

A number of Solaris sites I've worked on have had swap mounted on /tmp, which I always felt was an odd practice.

thamark · Post by **thamark** » Tue Jan 08, 2008 12:01 pm

Hi Ray,

Here is what i found out from our environment.

etlt01:/home/c6262cn $ swap -l
swapfile dev swaplo blocks free
/dev/md/dsk/d1 85,1 16 65553776 65553776
/dev/vx/dsk/swap_dg/swapvol 292,53000 16 141408240 141408240
etlt01:/home/c6262cn $

Do you think mouting swap on /tmp will solve this issue?

Thanks & Regards
Kannan

ray.wurlod · Post by **ray.wurlod** » Tue Jan 08, 2008 4:02 pm

Definitely not.