Hi All,
I have a sequence SEQ1. In this am running parallel jobs say, A, B,C,...,X, Y, Z as below
-----A-----
..
..
-----Y-----
-----Z-----
When i run these jobs invidually , they run fine. But when i run these jobs in sequnece SEQ1, few jobs abort randomly.Say sometimes A, C,F will abort. Sometimes on second run B,G,I will abort.
Below is the error message:
APT_CombinedOperatorController,1: Failure during execution of operator logic.
APT_CombinedOperatorController,1: Fatal Error: Could not connect to datasource[DataDirect][ODBC Sybase Wire Protocol driver]Socket closed.
:
APT_CombinedOperatorController,0: Failure during execution of operator logic.
APT_CombinedOperatorController,0: Fatal Error: Could not connect to datasource[DataDirect][ODBC Sybase Wire Protocol driver]Socket closed.
:
node_node2: combination of 5 operators [APT_TransformOperatorImplV3S2_wr_cust_code_tfm_cust_code in tfm_cust_code; wrti_cust_code; wrtirj_cust_code; wrtu_cust_code; wrturj_cust_code], partition 1 of 2, processID 18,143 on node2, player 1 terminated unexpectedly.
main_program: Unexpected exit status 1
sfs_cust_code,0: Internal Error: (shbuf): iomgr/iomgr.C: 1732
Traceback: Could not obtain stack trace; check that 'gdb' and 'sed' are installed and on your PATH
----------
In other jobs that abort(when ran in SEQ1) message may be something like
APT_CombinedOperatorController,1: Failure during execution of operator logic.
APT_CombinedOperatorController,1: Fatal Error: Could not connect to datasource[DataDirect][ODBC Sybase Wire Protocol driver]Socket closed.
:
APT_CombinedOperatorController,0: Failure during execution of operator logic.
main_program: Unexpected exit status 1
sfs_free_ship_code,0: Failure during execution of operator logic.
sfs_free_ship_code,0: Fatal Error: Unable to allocate communication resources
Any suggesstions??
_________________
Thanks and Regards
Reena
Socket closed error, when parallel jobs run in sequence
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 222
- Joined: Tue Aug 30, 2005 2:07 am
- Location: pune
- Contact:
Hi,
I think, these are all because of resource allocation to the jobs for monitoring the data.
We have environmental variables called APT_MONITOR_SIZE and APT_MONITOR_TIME. These variables will lead the monitoring functionality in Data Stage.
overridden the default setting with values, set APT_MONITOR_SIZE as 100000 and APT_MONITOR_TIME as 25. Try out this, If it is not working
The only other alternative here, is to turn monitoring off, by using APT_NO_JOBMON=TRUE
Just try out the above two in the mentioned order and let us know your results on the same.
I think, these are all because of resource allocation to the jobs for monitoring the data.
We have environmental variables called APT_MONITOR_SIZE and APT_MONITOR_TIME. These variables will lead the monitoring functionality in Data Stage.
overridden the default setting with values, set APT_MONITOR_SIZE as 100000 and APT_MONITOR_TIME as 25. Try out this, If it is not working
The only other alternative here, is to turn monitoring off, by using APT_NO_JOBMON=TRUE
Just try out the above two in the mentioned order and let us know your results on the same.
NageshSunkoji
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
Dont these errors look more like ODBC issues rather than job monitor problems?Nageshsunkoji wrote:Hi,
I think, these are all because of resource allocation to the jobs for monitoring the data.
We have environmental variables called APT_MONITOR_SIZE and APT_MONITOR_TIME. These variables will lead the monitoring functionality in Data Stage.
overridden the default setting with values, set APT_MONITOR_SIZE as 100000 and APT_MONITOR_TIME as 25. Try out this, If it is not working
The only other alternative here, is to turn monitoring off, by using APT_NO_JOBMON=TRUE
Just try out the above two in the mentioned order and let us know your results on the same.
Moreover - would a job abort if the monitoring has issues with the number of records to be monitored and the monitoring time - just wondering since that's what the environment variables mentioned relates to.
Thanks
Aneesh
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: