Page 1 of 1

Connection Lost Contact

Posted: Fri Dec 02, 2011 9:43 am
by DeepakCorning
Version : 7.5.2
Server : Windows OS

Issue : One of my sequencer triggers 4 jobs simultaneously out of which 3 complete , but 1 waits for ~90 minutes and then fails with ORA-03135: Connection lost contact.

The job which fails is not the same one all the time , and in addition to that the error is sporadic. Like it is fine for say 7 days and then suddenly one day it appears. When I restart the job and everything goes back to normal.

SQL trace file on datastage server shows the following connection attempt made , but strangely we do not even use the following TNS entry at all (not in the TNS file , and not in any of the jobs) -

(DESCRIPTION=(ADDRESS=(PROTOCOL=BEQ)(PROGRAM=oracle)(ARGV0=oracleORCL)(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(CONNECT_DATA=(SID=ORCL)(CID=(PROGRAM=E:\Ascential\DataStage\Engine\bin\uvsh.exe)(HOST=XXX)(USER=XXXX))))
Protocol Error

To me it looks like somehow its not able to decipher the correct connection setting sometimes (may be CPU is too busy) and tries with a wrong TNS name. Any one has seen this????

Thanks
Dk

Posted: Fri Dec 02, 2011 10:10 am
by DeepakCorning
Some more details - The failed job will have only 7 entries - Main ones to notice are

- Start Job
- Load Environment Variables
- Failure (after waiting for 90 Minutes).


No "queries" are issued.... probably just trying to ping??