Sun Solaris & Datastage 7.5 Problems
Moderators: chulett, rschirm, roy
Programs do not really allocate swap space; this is done by the system when pages no longer fit in physical memory.
What could be happening is that you are starting up a number of processes, each of which is using memory space for its programs and data.
Have you added APT_DUMP_SCORE to your jobs and seen how many pids are started up with an 8-node configuration?
vmstat is a good tool to use in order to get a quick snapshot of overall system loads. But you should be monitoring your DataStage processes and the amount of memory they are using and there are better tools to use.
Do your jobs do extensive sorting or repartitioning or non-sparse lookups? Particularly the last item can cause large amounts of memory to be used even when only a couple of actual data rows are processed.
What could be happening is that you are starting up a number of processes, each of which is using memory space for its programs and data.
Have you added APT_DUMP_SCORE to your jobs and seen how many pids are started up with an 8-node configuration?
vmstat is a good tool to use in order to get a quick snapshot of overall system loads. But you should be monitoring your DataStage processes and the amount of memory they are using and there are better tools to use.
Do your jobs do extensive sorting or repartitioning or non-sparse lookups? Particularly the last item can cause large amounts of memory to be used even when only a couple of actual data rows are processed.
Hi Arnd,
Thanks for more info in this issue.
Here is some information we gathered during this process.
Sun solaris
It runs 752 processes on 8 nodes.
AIX
It runs 733 processes on 8 nodes.
I thought SWAP space utilization is directly propotional to the no of process gets exeuted in a job, so i created a job, which contains only rowgenerator and peek stage(but many of them) to execute same and job ran sucessfully in both environment.
Sun Solaris
It runs 1296 processes on 8 nodes.
AIX
It runs 1296 processes on 8 nodes.
After seeing this i am not convinced that swap space is allocated by the OS, otherwise this job also should have failed, since no of process is more than thousand.
I do have lot of sorting and joins, but the problem here is that one job alone takes 60GB swap space irrespective of no of rows it handles(i ran the same job 100 records and i ran the same for 100000 and result of swap utilization is same).
I am not sure this is completely issue with OS(Sun Solaris) and test i have done doesn't prove that as well.
Please let me know what are all the other tools i can use and test the same to pinpoint this issue exactly where it is.
I am wondering to know that not many people have this issue, who is running on Sun Solaris OS.
Thanks for more info in this issue.
Here is some information we gathered during this process.
Sun solaris
It runs 752 processes on 8 nodes.
AIX
It runs 733 processes on 8 nodes.
I thought SWAP space utilization is directly propotional to the no of process gets exeuted in a job, so i created a job, which contains only rowgenerator and peek stage(but many of them) to execute same and job ran sucessfully in both environment.
Sun Solaris
It runs 1296 processes on 8 nodes.
AIX
It runs 1296 processes on 8 nodes.
After seeing this i am not convinced that swap space is allocated by the OS, otherwise this job also should have failed, since no of process is more than thousand.
I do have lot of sorting and joins, but the problem here is that one job alone takes 60GB swap space irrespective of no of rows it handles(i ran the same job 100 records and i ran the same for 100000 and result of swap utilization is same).
I am not sure this is completely issue with OS(Sun Solaris) and test i have done doesn't prove that as well.
Please let me know what are all the other tools i can use and test the same to pinpoint this issue exactly where it is.
I am wondering to know that not many people have this issue, who is running on Sun Solaris OS.
Hmm i will fill this when ever i get one
Your job is a complex one to use that many processes. I hope that your hardware is sufficiently beefy to support that type of job and that level of parallelism.
Look back at your job that is using 60Gb of swap space regardless of the number of rows processed. I am certain that you are loading a lookup to memory that is taking up most of this space. If you change your APT_CONFIG file to one with only 4 or even 1 node, how is your swap space affected?
Look back at your job that is using 60Gb of swap space regardless of the number of rows processed. I am certain that you are loading a lookup to memory that is taking up most of this space. If you change your APT_CONFIG file to one with only 4 or even 1 node, how is your swap space affected?
I was having this doubt that lookup might have taken all the space available, so i replaced all lookup with join stage and result is the same.
Yes job runs fine, if i run it using 4 node and 2 node.
Space utilization for 4 Node config is taking 60GB space
and 8 node fails and we have 130 GB space
The problem here is this swap utilization restrict us from running multiple jobs(even if it is not complex one) at the same time eventhough we know data we are dealing is less, which is hard to estimate how much hardware config needed to run jobs.
The same job is running fine in AIX environment which is having only 12 GB swap space, which is not explainable to the client.
we have 4 CPUs and dual processor.
Yes job runs fine, if i run it using 4 node and 2 node.
Space utilization for 4 Node config is taking 60GB space
and 8 node fails and we have 130 GB space
The problem here is this swap utilization restrict us from running multiple jobs(even if it is not complex one) at the same time eventhough we know data we are dealing is less, which is hard to estimate how much hardware config needed to run jobs.
The same job is running fine in AIX environment which is having only 12 GB swap space, which is not explainable to the client.
we have 4 CPUs and dual processor.
Hmm i will fill this when ever i get one
I am sorry to answer this so late.
I think my point here is that Datastage with Sun Solaris always takes so much space compare to Datastage with AIX combination, which is not expected and I think all these lookup and join are not candidate for sparse lookup, since input has huge no of records.
I would be happy to know answer for following questions.
Why does Datastage with Sun Solaris takes such a huge SWAP space when it is running the job?
I am hoping to see some more people who have faced this issue already, if they have same environment.
I think my point here is that Datastage with Sun Solaris always takes so much space compare to Datastage with AIX combination, which is not expected and I think all these lookup and join are not candidate for sparse lookup, since input has huge no of records.
I would be happy to know answer for following questions.
Why does Datastage with Sun Solaris takes such a huge SWAP space when it is running the job?
I am hoping to see some more people who have faced this issue already, if they have same environment.
Hmm i will fill this when ever i get one
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hi Ray,
Here is what i found out from our environment.
etlt01:/home/c6262cn $ swap -l
swapfile dev swaplo blocks free
/dev/md/dsk/d1 85,1 16 65553776 65553776
/dev/vx/dsk/swap_dg/swapvol 292,53000 16 141408240 141408240
etlt01:/home/c6262cn $
Do you think mouting swap on /tmp will solve this issue?
Thanks & Regards
Kannan
Here is what i found out from our environment.
etlt01:/home/c6262cn $ swap -l
swapfile dev swaplo blocks free
/dev/md/dsk/d1 85,1 16 65553776 65553776
/dev/vx/dsk/swap_dg/swapvol 292,53000 16 141408240 141408240
etlt01:/home/c6262cn $
Do you think mouting swap on /tmp will solve this issue?
Thanks & Regards
Kannan
Hmm i will fill this when ever i get one
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: