Java Issues with DataStage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
hobocamp
Premium Member
Premium Member
Posts: 98
Joined: Thu Aug 31, 2006 10:04 am

Java Issues with DataStage

Post by hobocamp »

We upgraded to DS 8.0.1 at our shop back in May. Since then, we have very random job failures where the job runs successfully with no warnings or errors, but then has a return code of 134.

Since our job script check that return code and finds a value other than 1, it resets the job , and our process abends.

I've been working with IBM on this issue, but they've had no suggestions so far.

Also, when this type of error occurs, our logs show error messages like the one below:

#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
# SIGSEGV (0xb) at pc=0xfc51f94c, pid=15078, tid=7
#
# Java VM: Java HotSpot(TM) Client VM (1.4.2_11-b06 mixed mode)
# Problematic frame:
# V [libjvm.so+0x11f94c]
#
# An error report file with more information is saved as hs_err_pid15078.log

Our unix admin has recommended that we update our version of the Java Virtual Machine. (DS is currently running with build 1.4.2_11-b06). He as asked me which directory the java executable binary runs from.

I have two questions out of all of this -

-Can anyone tell me which directory the java executable binary runs from?

-Has anyone else run into this issue? I've done extensive searching here and in other forums, and haven't seen this issue mentioned with DS. (Although the unix admin said he saw similar problems mentioned in other forums with other apps besides DS, and it appeared that the problem was usually solved by upgrading Java).

Thanks for any advice!

Tom
samunik
Premium Member
Premium Member
Posts: 10
Joined: Fri Oct 08, 2004 11:10 am
Location: USA

Re: Java Issues with DataStage

Post by samunik »

I believe, the binary executable would be in the AppServer directory which was created.

Please update this thread with what did you find for a resolution.

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

HP-UX?
-craig

"You can never have too many knives" -- Logan Nine Fingers
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Tom,
You should use the JVM that comes with the DataStage such as the following:
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2)
Classic VM (build 1.4.2, J2RE 1.4.2 IBM build cxia32142-20060421 (SR5) (JIT enabled: jitc))

Your error indicates that you are using the HotSpot VM.
You should also have a soft link to the ../IBM/InformationServer/ASBNode/apps/jre/bin.
Don't upgrade the JVM. Just make sure you are using above IBM's buid 1.4.2.
hobocamp
Premium Member
Premium Member
Posts: 98
Joined: Thu Aug 31, 2006 10:04 am

Post by hobocamp »

Thanks for the reply so far.

Istaur - According to the following information below provided by my unix admin, I believe I am using the correct version of Java:



/app/IBM/InformationServer/ASBNode/apps/jre/bin/java -version shows:


java version "1.4.2_11"

Java(TM) 2 Runtime Environment, Standard Edition (IBM build 1.4.2_11-b06 20060421)

Java HotSpot(TM) Client VM (build 1.4.2_11-b06, mixed mode)

IBM Java ORB build orb142-20060421 (SR5)

XML build XSLT4J Java 2.6.9

XML build XmlCommonsExternal 1.2.04

XML build XML4J 4.3.7
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Random errors are difficult.....and in this case you are, as you should be, using only the JVM that comes in the box. Perhaps you can describe what you are doing in your classes, whether you are using JavaTransformer or JavaClient, etc., and we can see if there's something that might be tweaking the issue........are you playing with external files, or doing anything from a memory perspective, or communicating outside of the box within your class, etc.? [just looking for variables that might be pushing on other limits/areas]...

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Still wondering if this is the infamous HP-UX and $LD_PRELOAD problem.
-craig

"You can never have too many knives" -- Logan Nine Fingers
hobocamp
Premium Member
Premium Member
Posts: 98
Joined: Thu Aug 31, 2006 10:04 am

Post by hobocamp »

Craig - I saw your post yesterday but didn't know what you meant. I did some searching and I think I see now what you were asking. We are on a Sun Solaris system (vers 9 i believe), so I think that would rule out any use of HP-UX.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yup. Just wanted to ask as it's bit me in the ass before. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
hobocamp
Premium Member
Premium Member
Posts: 98
Joined: Thu Aug 31, 2006 10:04 am

Post by hobocamp »

Ernie -

I'm not dealing directly with java at all. The only Java that's going in is that which is running behind the scenes in v.8.

The problem jobs are simple - they are, for the most part, simple multi-instance jobs (server). (Though we do have this same error with regular server jobs occasionally as well.)

These are 4 multi-instance jobs that do a simple load, through a transformer, into one of 4 Oracle tables. (They're just for recording run-related stats.) These jobs run literally hundreds of times a night, and a failure occurs (at any random time) about twice a night.

One interesting point though is this: our scripts call .csv files to determine the run order of jobs for all of our applications. The .csv runs one of these multi-instance jobs basically between every main job. Then at the end, our scripts run this same multi-instance job (and three others) at the end of the process. The failures (return code 134), when they occur, occur when the script runs the jobs, not when the jobs run through the .csv file.

Sorry for the long-winded post!

Tom
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Long posts are ok, as long as they are interesting. ;) ....hmm...no java...... I wonder if it's something with the logging agent. I don't know that much about its architecture, except that it's definitely another java thing that's out there in the mix. But odd that it would result in your jobs aborting.... is it only this one particular job that has these sporadic failures, or various jobs? Just grasping at ideas at the moment....

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Likewise the performance monitor is a Java app, but there should be no way that it could cause jobs to abort. That's the theory anyway. In practice jobs can abort through not being able to communicate with the performance monitor.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
hobocamp
Premium Member
Premium Member
Posts: 98
Joined: Thu Aug 31, 2006 10:04 am

Post by hobocamp »

Another odd thing is that we don't always even get the dump I posted previously, and there's no mention of anything related to java. On these occasions, (as in a failure we just had), the only indication of a problem that we see are these entries in the history and error log (both of which are from our job run script:

From the history log:

ETL completed with return code: 134
ETL Process Audit Job Failed

From the error log:

/data01/Ascential/DataStage/data/OdsProd/script/ETLOutboundSched.ksh[283]: 6323 Abort(coredump)


The one constant, java dump or not, is the mysterious 134 return code.

Tom
hobocamp
Premium Member
Premium Member
Posts: 98
Joined: Thu Aug 31, 2006 10:04 am

Post by hobocamp »

The workaround we decided to use was to alter our job run scripts to allow for return codes of both 1 and 134.
Post Reply