Socket closed error, when parallel jobs run in sequence

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
reena123
Participant
Posts: 9
Joined: Sat Dec 02, 2006 5:31 am

Socket closed error, when parallel jobs run in sequence

Post by reena123 »

Hi All,

I have a sequence SEQ1. In this am running parallel jobs say, A, B,C,...,X, Y, Z as below
-----A-----
..
..
-----Y-----
-----Z-----

When i run these jobs invidually , they run fine. But when i run these jobs in sequnece SEQ1, few jobs abort randomly.Say sometimes A, C,F will abort. Sometimes on second run B,G,I will abort.

Below is the error message:
APT_CombinedOperatorController,1: Failure during execution of operator logic.
APT_CombinedOperatorController,1: Fatal Error: Could not connect to datasource[DataDirect][ODBC Sybase Wire Protocol driver]Socket closed.
:
APT_CombinedOperatorController,0: Failure during execution of operator logic.
APT_CombinedOperatorController,0: Fatal Error: Could not connect to datasource[DataDirect][ODBC Sybase Wire Protocol driver]Socket closed.
:
node_node2: combination of 5 operators [APT_TransformOperatorImplV3S2_wr_cust_code_tfm_cust_code in tfm_cust_code; wrti_cust_code; wrtirj_cust_code; wrtu_cust_code; wrturj_cust_code], partition 1 of 2, processID 18,143 on node2, player 1 terminated unexpectedly.
main_program: Unexpected exit status 1
sfs_cust_code,0: Internal Error: (shbuf): iomgr/iomgr.C: 1732
Traceback: Could not obtain stack trace; check that 'gdb' and 'sed' are installed and on your PATH

----------
In other jobs that abort(when ran in SEQ1) message may be something like

APT_CombinedOperatorController,1: Failure during execution of operator logic.
APT_CombinedOperatorController,1: Fatal Error: Could not connect to datasource[DataDirect][ODBC Sybase Wire Protocol driver]Socket closed.
:
APT_CombinedOperatorController,0: Failure during execution of operator logic.
main_program: Unexpected exit status 1
sfs_free_ship_code,0: Failure during execution of operator logic.
sfs_free_ship_code,0: Fatal Error: Unable to allocate communication resources



Any suggesstions??

_________________
Thanks and Regards
Reena
Nageshsunkoji
Participant
Posts: 222
Joined: Tue Aug 30, 2005 2:07 am
Location: pune
Contact:

Post by Nageshsunkoji »

Hi,

I think, these are all because of resource allocation to the jobs for monitoring the data.

We have environmental variables called APT_MONITOR_SIZE and APT_MONITOR_TIME. These variables will lead the monitoring functionality in Data Stage.

overridden the default setting with values, set APT_MONITOR_SIZE as 100000 and APT_MONITOR_TIME as 25. Try out this, If it is not working

The only other alternative here, is to turn monitoring off, by using APT_NO_JOBMON=TRUE

Just try out the above two in the mentioned order and let us know your results on the same.
NageshSunkoji

If you know anything SHARE it.............
If you Don't know anything LEARN it...............
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

Nageshsunkoji wrote:Hi,

I think, these are all because of resource allocation to the jobs for monitoring the data.

We have environmental variables called APT_MONITOR_SIZE and APT_MONITOR_TIME. These variables will lead the monitoring functionality in Data Stage.

overridden the default setting with values, set APT_MONITOR_SIZE as 100000 and APT_MONITOR_TIME as 25. Try out this, If it is not working

The only other alternative here, is to turn monitoring off, by using APT_NO_JOBMON=TRUE

Just try out the above two in the mentioned order and let us know your results on the same.
Dont these errors look more like ODBC issues rather than job monitor problems?

Moreover - would a job abort if the monitoring has issues with the number of records to be monitored and the monitoring time - just wondering since that's what the environment variables mentioned relates to.

Thanks

Aneesh
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Was Sybase shut down or quiesced before or while this job was running?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
reena123
Participant
Posts: 9
Joined: Sat Dec 02, 2006 5:31 am

Post by reena123 »

I tried setting all the monitoring options as suggested. But none of them are working for me. Am still getting Socket closed error.

No, Sybase was not shut down or quiesced before or while this job was running.

Thanks and Regards,
Reena
Post Reply