VOC Error
Moderators: chulett, rschirm, roy
VOC Error
Hello,
since about 1month ago, we have some problems to import datastage components in our Receipt's environment.
We couldn't understand the problem, but we figured out it was the universe base which was not stable.
Yesterday, we made a Rebuild index by DS.TOOLS in the Administrator and it seemed to work out. But this morning, one of our treatments (that ran 3 times before) aborted in several jobs with different errors that I'm putting here, each error came from a different job:
Program "DSD.StageRun": pc = 1E44, Unable to open the operating system file "DSG_BP.O/DSR_TIMESTAMP.B".
[ENFILE] File table overflow
Program "DSD.StageRun": pc = 1E44, Unable to load file "DSR_TIMESTAMP".
Program "DSD.StageRun": pc = 1E44, Unable to load subroutine.
Attempting to Cleanup after ABORT raised in stage PhaOgAlimDtwTabRefEga3ColA..InterProcess_7.IDENT17
DataStage Phantom Aborting with @ABORT.CODE = 3
Program "DSD.StageRun": pc = 70C, "$DS.GETPID" is not in the CATALOG space.
[ENFILE] File table overflow
Program "DSD.StageRun": Line 233, Incorrect VOC entry for $DS.GETPID.
Program "DSD.StageRun": Line 233, Unable to load subroutine.
Cannot find a job number 0
Attempting to Cleanup after ABORT raised in stage 0
DataStage Phantom Aborting with @ABORT.CODE = 3
Program "DSD.LinkReport": pc = 2BA, Unable to open the operating system file "DSD_BP.O/DSD_AddLinkEvent.B".
[ENFILE] File table overflow
Program "DSD.LinkReport": pc = 2BA, Unable to load file "DSD.AddLinkEvent".
Program "DSD.LinkReport": pc = 2BA, Unable to load subroutine.
Attempting to Cleanup after ABORT raised in stage PhaOgAlimDtwTabRefEga3ColF..REF_OG
DataStage Phantom Aborting with @ABORT.CODE = 3
ds_loadlibrary: error in dlopen of oraoci9.so - libwtc9.so: cannot open shared object file: No such file or directory
The last message is a fatal and the others are just warnings.
And to finish, one of the jobs just aborted without any warning or error.
What shell we do to fix this problem?
since about 1month ago, we have some problems to import datastage components in our Receipt's environment.
We couldn't understand the problem, but we figured out it was the universe base which was not stable.
Yesterday, we made a Rebuild index by DS.TOOLS in the Administrator and it seemed to work out. But this morning, one of our treatments (that ran 3 times before) aborted in several jobs with different errors that I'm putting here, each error came from a different job:
Program "DSD.StageRun": pc = 1E44, Unable to open the operating system file "DSG_BP.O/DSR_TIMESTAMP.B".
[ENFILE] File table overflow
Program "DSD.StageRun": pc = 1E44, Unable to load file "DSR_TIMESTAMP".
Program "DSD.StageRun": pc = 1E44, Unable to load subroutine.
Attempting to Cleanup after ABORT raised in stage PhaOgAlimDtwTabRefEga3ColA..InterProcess_7.IDENT17
DataStage Phantom Aborting with @ABORT.CODE = 3
Program "DSD.StageRun": pc = 70C, "$DS.GETPID" is not in the CATALOG space.
[ENFILE] File table overflow
Program "DSD.StageRun": Line 233, Incorrect VOC entry for $DS.GETPID.
Program "DSD.StageRun": Line 233, Unable to load subroutine.
Cannot find a job number 0
Attempting to Cleanup after ABORT raised in stage 0
DataStage Phantom Aborting with @ABORT.CODE = 3
Program "DSD.LinkReport": pc = 2BA, Unable to open the operating system file "DSD_BP.O/DSD_AddLinkEvent.B".
[ENFILE] File table overflow
Program "DSD.LinkReport": pc = 2BA, Unable to load file "DSD.AddLinkEvent".
Program "DSD.LinkReport": pc = 2BA, Unable to load subroutine.
Attempting to Cleanup after ABORT raised in stage PhaOgAlimDtwTabRefEga3ColF..REF_OG
DataStage Phantom Aborting with @ABORT.CODE = 3
ds_loadlibrary: error in dlopen of oraoci9.so - libwtc9.so: cannot open shared object file: No such file or directory
The last message is a fatal and the others are just warnings.
And to finish, one of the jobs just aborted without any warning or error.
What shell we do to fix this problem?
I wonder if you haven't hit the system file table limit, referring to
The parameter and location where this system limit is set depends upon which flavor of UNIX you are using.
.[ENFILE] File table overflow
The parameter and location where this system limit is set depends upon which flavor of UNIX you are using.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
This system limit needs to be reconfigured if too restrictive. What UNIX do you have?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
what does "ulimit -a" for your datastage user show? And what setting for NFILES in your sysconfig?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
ulimit -a shows:
time(cpu-seconds) unlimited
file(blocks) unlimited
coredump(blocks) 0
data(kbytes) unlimited
stack(kbytes) 8192
lockedmem(kbytes) unlimited
memory(kbytes) unlimited
nofiles(descriptors) 1024
processes 15231
How can I see the sysconfig information? Is it NFILES or NOFILES that you need?
time(cpu-seconds) unlimited
file(blocks) unlimited
coredump(blocks) 0
data(kbytes) unlimited
stack(kbytes) 8192
lockedmem(kbytes) unlimited
memory(kbytes) unlimited
nofiles(descriptors) 1024
processes 15231
How can I see the sysconfig information? Is it NFILES or NOFILES that you need?
Your user process can only open 1024 files at a time; I'm not sure if the error message is the same, though. Your user stack is pretty small as well.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I think you can use "sysctl" in RedHat but an not sure.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I transfered the information to an Unix administrator who proposed to modify this:
@dstage soft nofile 1024
@dstage hard nofile 65536
in the file /etc/security/limits.conf. They changed this, but when I do ulimit -a, I still get the same:
time(cpu-seconds) unlimited
file(blocks) unlimited
coredump(blocks) 0
data(kbytes) unlimited
stack(kbytes) 8192
lockedmem(kbytes) unlimited
memory(kbytes) unlimited
nofiles(descriptors) 1024
processes 15231
Is there something else we could do?
@dstage soft nofile 1024
@dstage hard nofile 65536
in the file /etc/security/limits.conf. They changed this, but when I do ulimit -a, I still get the same:
time(cpu-seconds) unlimited
file(blocks) unlimited
coredump(blocks) 0
data(kbytes) unlimited
stack(kbytes) 8192
lockedmem(kbytes) unlimited
memory(kbytes) unlimited
nofiles(descriptors) 1024
processes 15231
Is there something else we could do?
Why not try setting the soft file limit to 2048 and checking to see if it makes a difference?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
No negative impact unless the system is already overloaded.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>