Hi ,
We have a wave job. The structure is as below.
MQ connector----> Transformations ------lookup with the DB2........MQ connector + Distributed transaction (after lookup we do some transformations and load to these two outputs)
The issue is the job fails with below error at least once in a day. My catch is this happens mostly in off business hours where the input messages are less (this might be wrong also). So we analyzed with an approach that the communication is lot due to idle job with no record processing.
Lkp_ItemId_Advice,0: SQLExecute reported: SQLSTATE = 40003: Native Error Code = -30,081: Msg = [IBM][CLI Driver] SQL30081N A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "10.146.53.15". Communication function detecting the error: "send". Protocol specific error code(s): "32", "*", "0". SQLSTATE=08001 (CC_DB2DBStatement::executeSelect, file CC_DB2DBStatement.cpp, line 1,958)
Things done so far:
1) kept the debugging log and got this exception before the communication error.
Lkp_ItemId_Advice,0: PXBridgeOp::runLocally *not* CC_Exception::CC_X_DAAPI_ROW_ERROR
PXBridgeOp::runLocally CC_Exception 4
2) Query timeout is set to 0 as suggested by IBM through a PMR raised by us. (db2 update cli cfg for section common using QUERYTIMEOUTINTERVAL 0 )
3) Set the Keep alive time out to 20000(in the file db2dsdriver.cfg)
4) Set Keep conductor connection alive to No (in the DB2 connector properties).
Nothing worked so far and the job fails atleast once in a day. I am working on this for more than a month and none of the communication errors relating to datastage with IBM is helping.
Your suggestion would be a great help as i want to resolve this as soon as possible
Communication error
Moderators: chulett, rschirm, roy
If all the Wave jobs are having the issue then it might be something external to DataStage that is disrupting the connection. Do you have a network team that could assist by providing some monitoring?
Do all the jobs abort at once? Can you restart them all at the same time to see if they all abort after roughly the same amount of time? If the first occurs, then it might be some sort of network / firewall blip that is resetting the connection. If it is the latter, then its a timeout setting on something.
Do all the jobs abort at once? Can you restart them all at the same time to see if they all abort after roughly the same amount of time? If the first occurs, then it might be some sort of network / firewall blip that is resetting the connection. If it is the latter, then its a timeout setting on something.