Page 2 of 2

Posted: Wed Sep 29, 2010 9:26 pm
by siauchun84
When your job hang, have you checked on the task manager in the server on the total numbers of the Osh.exe? I have faced similiar problem before which there are lots of Osh.exe processes still exist in the task.

Posted: Wed Sep 29, 2010 11:24 pm
by wbeitler
Thanks for that. Weird problems make you look for weird things...
I'll keep you posted.

Posted: Thu Sep 30, 2010 3:59 am
by wbeitler
siauchun84,
When your job hang, have you checked on the task manager in the server on the total numbers of the Osh.exe?
Aborting the job manually actually did get rid of 12 osh.exe processes.
Care to join how you solved your problem ?

Posted: Sun Oct 03, 2010 10:07 pm
by siauchun84
wbeitler wrote:siauchun84,
Aborting the job manually actually did get rid of 12 osh.exe processes.
Care to join how you solved your problem ?
Hi, wbeitler, what I did was created a vbscript batch file to kill all the osh.exe before I trigger my next job seem I was using Windows platform.
*Note: Killing the osh.exe will causing all the running jobs terminated.

You may copy the following into notepad and save as .vbs in the server (Just double click it if you have direct access to the server. If not, create a seq job to trigger the script):
-------- VB Script --------
Dim wql
Dim wmi
Dim oResults

Set wmi = GetObject("winmgmts:")
wql = "SELECT * FROM win32_process WHERE name ='osh.exe'"
Set oResults = wmi.ExecQuery(wql)
For Each Process In oResults
Process.Terminate
Next
set wmi = nothing
---- End of VB Script ----------

Posted: Wed Oct 06, 2010 5:58 am
by wbeitler
Siachun,

thanks for sharing, but no option here since we're running multiple jobs in parallel.

Jobs actually didn't crash after we've enabled the APT_RECORD_COUNTS reporting environment variable. That somehow seems to 'keep the communication alive' ?! Would that make any sense ?!

Posted: Thu Oct 07, 2010 3:41 am
by wbeitler
Doesn't make any sense... Job hanging again, although we now do get the second 'Load complete' message, but the 'Record count' for only 1 of the nodes... Waiting in vain for the second node to return it's record count :cry:

Posted: Thu Oct 07, 2010 5:29 am
by chulett
So... did you ever bring support into the picture? Have they tried to figure out what in the heck might be going on? Seems like you are well and truly into their terrritory now.

Posted: Thu Oct 07, 2010 5:50 am
by wbeitler
PMR- logged. Still waiting for their answer.

Posted: Wed Nov 17, 2010 8:03 am
by wbeitler
Installed patch JR36567 and Fixpack 3 as adviced by IBM Support.
Didn't fix the problem though... Any new thoughts?

William