Page 1 of 1

Posted: Fri Dec 28, 2007 5:09 am
by ArndW
Programs do not really allocate swap space; this is done by the system when pages no longer fit in physical memory.

What could be happening is that you are starting up a number of processes, each of which is using memory space for its programs and data.

Have you added APT_DUMP_SCORE to your jobs and seen how many pids are started up with an 8-node configuration?

vmstat is a good tool to use in order to get a quick snapshot of overall system loads. But you should be monitoring your DataStage processes and the amount of memory they are using and there are better tools to use.

Do your jobs do extensive sorting or repartitioning or non-sparse lookups? Particularly the last item can cause large amounts of memory to be used even when only a couple of actual data rows are processed.

Posted: Fri Dec 28, 2007 9:19 am
by thamark
Hi Arnd,

Thanks for more info in this issue.

Here is some information we gathered during this process.

Sun solaris

It runs 752 processes on 8 nodes.

AIX

It runs 733 processes on 8 nodes.

I thought SWAP space utilization is directly propotional to the no of process gets exeuted in a job, so i created a job, which contains only rowgenerator and peek stage(but many of them) to execute same and job ran sucessfully in both environment.

Sun Solaris

It runs 1296 processes on 8 nodes.

AIX

It runs 1296 processes on 8 nodes.

After seeing this i am not convinced that swap space is allocated by the OS, otherwise this job also should have failed, since no of process is more than thousand.

I do have lot of sorting and joins, but the problem here is that one job alone takes 60GB swap space irrespective of no of rows it handles(i ran the same job 100 records and i ran the same for 100000 and result of swap utilization is same).

I am not sure this is completely issue with OS(Sun Solaris) and test i have done doesn't prove that as well.

Please let me know what are all the other tools i can use and test the same to pinpoint this issue exactly where it is.

I am wondering to know that not many people have this issue, who is running on Sun Solaris OS.

Posted: Fri Dec 28, 2007 9:27 am
by ArndW
Your job is a complex one to use that many processes. I hope that your hardware is sufficiently beefy to support that type of job and that level of parallelism.

Look back at your job that is using 60Gb of swap space regardless of the number of rows processed. I am certain that you are loading a lookup to memory that is taking up most of this space. If you change your APT_CONFIG file to one with only 4 or even 1 node, how is your swap space affected?

Posted: Fri Dec 28, 2007 9:39 am
by thamark
I was having this doubt that lookup might have taken all the space available, so i replaced all lookup with join stage and result is the same.

Yes job runs fine, if i run it using 4 node and 2 node.

Space utilization for 4 Node config is taking 60GB space

and 8 node fails and we have 130 GB space

The problem here is this swap utilization restrict us from running multiple jobs(even if it is not complex one) at the same time eventhough we know data we are dealing is less, which is hard to estimate how much hardware config needed to run jobs.

The same job is running fine in AIX environment which is having only 12 GB swap space, which is not explainable to the client.

we have 4 CPUs and dual processor.

Posted: Fri Dec 28, 2007 9:50 am
by ArndW
Actually, replacing your joins with sparse lookups would show memory usage a lot better if you could try that - with a sparse lookup the reference data is not loaded to memory at all.

Posted: Thu Jan 03, 2008 12:13 pm
by thamark
I am sorry to answer this so late.

I think my point here is that Datastage with Sun Solaris always takes so much space compare to Datastage with AIX combination, which is not expected and I think all these lookup and join are not candidate for sparse lookup, since input has huge no of records.

I would be happy to know answer for following questions.

Why does Datastage with Sun Solaris takes such a huge SWAP space when it is running the job?

I am hoping to see some more people who have faced this issue already, if they have same environment.

Posted: Thu Jan 03, 2008 12:25 pm
by ArndW
This particular problem should be submitted to your support provider. I have worked on Solaris installations and never noticed swap or allocated space issues; but that was prior to 7.5.

Posted: Thu Jan 03, 2008 12:44 pm
by thamark
We already raised this issue with Support team and they says it is SUN Solaris issue, so to know for sure it would be nice to hear from some more people regarding the same(who have experienced the same).

Posted: Thu Jan 03, 2008 3:38 pm
by ray.wurlod
A number of Solaris sites I've worked on have had swap mounted on /tmp, which I always felt was an odd practice.

Posted: Tue Jan 08, 2008 12:01 pm
by thamark
Hi Ray,

Here is what i found out from our environment.

etlt01:/home/c6262cn $ swap -l
swapfile dev swaplo blocks free
/dev/md/dsk/d1 85,1 16 65553776 65553776
/dev/vx/dsk/swap_dg/swapvol 292,53000 16 141408240 141408240
etlt01:/home/c6262cn $

Do you think mouting swap on /tmp will solve this issue?

Thanks & Regards
Kannan

Posted: Tue Jan 08, 2008 4:02 pm
by ray.wurlod
Definitely not.