Page 1 of 1

Java Issues with DataStage

Posted: Thu Sep 04, 2008 10:31 am
by hobocamp
We upgraded to DS 8.0.1 at our shop back in May. Since then, we have very random job failures where the job runs successfully with no warnings or errors, but then has a return code of 134.

Since our job script check that return code and finds a value other than 1, it resets the job , and our process abends.

I've been working with IBM on this issue, but they've had no suggestions so far.

Also, when this type of error occurs, our logs show error messages like the one below:

#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
# SIGSEGV (0xb) at pc=0xfc51f94c, pid=15078, tid=7
#
# Java VM: Java HotSpot(TM) Client VM (1.4.2_11-b06 mixed mode)
# Problematic frame:
# V [libjvm.so+0x11f94c]
#
# An error report file with more information is saved as hs_err_pid15078.log

Our unix admin has recommended that we update our version of the Java Virtual Machine. (DS is currently running with build 1.4.2_11-b06). He as asked me which directory the java executable binary runs from.

I have two questions out of all of this -

-Can anyone tell me which directory the java executable binary runs from?

-Has anyone else run into this issue? I've done extensive searching here and in other forums, and haven't seen this issue mentioned with DS. (Although the unix admin said he saw similar problems mentioned in other forums with other apps besides DS, and it appeared that the problem was usually solved by upgrading Java).

Thanks for any advice!

Tom

Re: Java Issues with DataStage

Posted: Thu Sep 04, 2008 11:11 am
by samunik
I believe, the binary executable would be in the AppServer directory which was created.

Please update this thread with what did you find for a resolution.

Thanks.

Posted: Thu Sep 04, 2008 11:22 am
by chulett
HP-UX?

Posted: Thu Sep 04, 2008 12:44 pm
by lstsaur
Tom,
You should use the JVM that comes with the DataStage such as the following:
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2)
Classic VM (build 1.4.2, J2RE 1.4.2 IBM build cxia32142-20060421 (SR5) (JIT enabled: jitc))

Your error indicates that you are using the HotSpot VM.
You should also have a soft link to the ../IBM/InformationServer/ASBNode/apps/jre/bin.
Don't upgrade the JVM. Just make sure you are using above IBM's buid 1.4.2.

Posted: Fri Sep 05, 2008 9:13 am
by hobocamp
Thanks for the reply so far.

Istaur - According to the following information below provided by my unix admin, I believe I am using the correct version of Java:



/app/IBM/InformationServer/ASBNode/apps/jre/bin/java -version shows:


java version "1.4.2_11"

Java(TM) 2 Runtime Environment, Standard Edition (IBM build 1.4.2_11-b06 20060421)

Java HotSpot(TM) Client VM (build 1.4.2_11-b06, mixed mode)

IBM Java ORB build orb142-20060421 (SR5)

XML build XSLT4J Java 2.6.9

XML build XmlCommonsExternal 1.2.04

XML build XML4J 4.3.7

Posted: Fri Sep 05, 2008 10:16 am
by eostic
Random errors are difficult.....and in this case you are, as you should be, using only the JVM that comes in the box. Perhaps you can describe what you are doing in your classes, whether you are using JavaTransformer or JavaClient, etc., and we can see if there's something that might be tweaking the issue........are you playing with external files, or doing anything from a memory perspective, or communicating outside of the box within your class, etc.? [just looking for variables that might be pushing on other limits/areas]...

Ernie

Posted: Fri Sep 05, 2008 10:28 am
by chulett
Still wondering if this is the infamous HP-UX and $LD_PRELOAD problem.

Posted: Fri Sep 05, 2008 10:36 am
by hobocamp
Craig - I saw your post yesterday but didn't know what you meant. I did some searching and I think I see now what you were asking. We are on a Sun Solaris system (vers 9 i believe), so I think that would rule out any use of HP-UX.

Posted: Fri Sep 05, 2008 11:05 am
by chulett
Yup. Just wanted to ask as it's bit me in the ass before. :wink:

Posted: Fri Sep 05, 2008 1:05 pm
by hobocamp
Ernie -

I'm not dealing directly with java at all. The only Java that's going in is that which is running behind the scenes in v.8.

The problem jobs are simple - they are, for the most part, simple multi-instance jobs (server). (Though we do have this same error with regular server jobs occasionally as well.)

These are 4 multi-instance jobs that do a simple load, through a transformer, into one of 4 Oracle tables. (They're just for recording run-related stats.) These jobs run literally hundreds of times a night, and a failure occurs (at any random time) about twice a night.

One interesting point though is this: our scripts call .csv files to determine the run order of jobs for all of our applications. The .csv runs one of these multi-instance jobs basically between every main job. Then at the end, our scripts run this same multi-instance job (and three others) at the end of the process. The failures (return code 134), when they occur, occur when the script runs the jobs, not when the jobs run through the .csv file.

Sorry for the long-winded post!

Tom

Posted: Fri Sep 05, 2008 2:53 pm
by eostic
Long posts are ok, as long as they are interesting. ;) ....hmm...no java...... I wonder if it's something with the logging agent. I don't know that much about its architecture, except that it's definitely another java thing that's out there in the mix. But odd that it would result in your jobs aborting.... is it only this one particular job that has these sporadic failures, or various jobs? Just grasping at ideas at the moment....

Ernie

Posted: Fri Sep 05, 2008 6:01 pm
by ray.wurlod
Likewise the performance monitor is a Java app, but there should be no way that it could cause jobs to abort. That's the theory anyway. In practice jobs can abort through not being able to communicate with the performance monitor.

Posted: Wed Sep 10, 2008 12:56 pm
by hobocamp
Another odd thing is that we don't always even get the dump I posted previously, and there's no mention of anything related to java. On these occasions, (as in a failure we just had), the only indication of a problem that we see are these entries in the history and error log (both of which are from our job run script:

From the history log:

ETL completed with return code: 134
ETL Process Audit Job Failed

From the error log:

/data01/Ascential/DataStage/data/OdsProd/script/ETLOutboundSched.ksh[283]: 6323 Abort(coredump)


The one constant, java dump or not, is the mysterious 134 return code.

Tom

Posted: Thu Sep 18, 2008 5:42 am
by hobocamp
The workaround we decided to use was to alter our job run scripts to allow for return codes of both 1 and 134.