Weird Problem in Server jobs

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vinodlakshmanan
Participant
Posts: 82
Joined: Wed Jul 14, 2004 7:21 am
Location: India

Weird Problem in Server jobs

Post by vinodlakshmanan »

There are a host of problems, I'll relate them all in steps:
1. There is a server job which reads from and writes to TeraData tables, with a lot of transformations. This job was running fine till a couple of days ago. The first problem was the job used to abort due to interrupted socket calls for no reason at all.
2. When I tried to view the log in director, director started hanging. Hence, I used to used dsjob from the console to view the logs, which was working fine.
3. Since yesterday evening, whenever I open the job in Designer or try to view the log in Director, an error pops up
Failed to Open RT_LOGnn File
. However this RT_LOGnn file is there in $DSHOME/../Project/<Project dir>.
4. Since yesterday evening, I could not view the log files using dsjob as well, I got the following error:

Code: Select all

ERROR: Failed to open job

Status code = -1004?
5. I tried using uv/uvsh/dssh commands, but got the following error:

Code: Select all

/usr/lib/dld.sl: Can't find path for shared library: libUtilWSClient.sl
/usr/lib/dld.sl: No such file or directory
Abort(coredump)
I had posted the query regarding this on the forum before. The problem was not solved by modifying the SHLIB_PATH variable. This problem (of uv not working) occurs at 2 servers where DS 7.0 is installed, on at onsite and the other here at offshore.
6. Since today morning, I'm not able to view the affected jobs in Director, but they still appear in the repository window of Designer :shock:

I tried creating copies of the affected jobs, but the problem occurs on the copy jobs too. :?:
Please provide a solution.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

RT_LOGnn is almost certainly corrupted. Did it reach 2GB? Have you been purging it regularly?

A very large log can give the impression of Directory hanging; it performs a full table scan to find those records that satisfy the current filter setting. It wasn't really hanging; it was waiting. And you were being impatient, not realising what was going on.

In an Administrator client Command window, execute the command

Code: Select all

CLEAR.FILE RT_LOGnn
which might provide a quick fix. You may also need to re-compile the job. If clearing the log doesn't work, post again.

Set auto-purge on, to "last 1 job run", for this job. Find out why it's logging so many entries. Fix the parts that are generating warnings. Disable the "testing" parts of the code that are logging informational messages. And so on. Keep the log small as much as possible.

The problem with libUtilWSClient.sl is almost certainly one of SHLIB_PATH or LD_LIBRARY_PATH. Where is libUtilWSClient.sl on your system? Is this directory represented in the library search path?

Do you execute the $DSHOME/dsenv script as an interactive user before attempting to use the $DSHOME/bin/dssh command?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vinodlakshmanan
Participant
Posts: 82
Joined: Wed Jul 14, 2004 7:21 am
Location: India

Post by vinodlakshmanan »

RT_LOGnn is almost certainly corrupted. Did it reach 2GB? Have you been purging it regularly?

A very large log can give the impression of Directory hanging; it performs a full table scan to find those records that satisfy the current filter setting. It wasn't really hanging; it was waiting. And you were being impatient, not realising what was going on.
Yes, auto-purge is On and I did execute the clear command from administrator. For 2 log files, I got the following errors:

Code: Select all

Internal File Corruption detected during file open!
File must be repaired possible truncation
hsize: 2048
bsize: 2048
fsize:  4013056
Cannot open file "RT_LOG294"
***Processing cannot continue***
The problem with libUtilWSClient.sl is almost certainly one of SHLIB_PATH or LD_LIBRARY_PATH. Where is libUtilWSClient.sl on your system? Is this directory represented in the library search path?
Yes. libUtilWSClient.sl is located at
/ascential/apt/Ascential/DataStage/DSEngine/uvdlls/libUtilWSClient.sl
/ascential/apt/Ascential/DataStage/DSEngine/lib/libUtilWSClient.sl
The SHLIB_PATH in dsenv is
SHLIB_PATH=`dirname $DSHOME`/branded_odbc/lib:$DSHOME/lib:$DSHOME/uvdlls:$DSHOME/java/jre/lib/PA_RISC/hotspot:$DSHOME/java/jre/lib/PA_RISC:$SHLIB_PATH

Do you execute the $DSHOME/dsenv script as an interactive user before attempting to use the $DSHOME/bin/dssh command?
Yes I did
Post Reply