Communication error

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
m_mani87
Participant
Posts: 45
Joined: Thu May 24, 2012 11:13 pm
Location: Coimbatore

Communication error

Post by m_mani87 »

Hi ,

We have a wave job. The structure is as below.

MQ connector----> Transformations ------lookup with the DB2........MQ connector + Distributed transaction (after lookup we do some transformations and load to these two outputs)

The issue is the job fails with below error at least once in a day. My catch is this happens mostly in off business hours where the input messages are less (this might be wrong also). So we analyzed with an approach that the communication is lot due to idle job with no record processing.


Lkp_ItemId_Advice,0: SQLExecute reported: SQLSTATE = 40003: Native Error Code = -30,081: Msg = [IBM][CLI Driver] SQL30081N A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "10.146.53.15". Communication function detecting the error: "send". Protocol specific error code(s): "32", "*", "0". SQLSTATE=08001 (CC_DB2DBStatement::executeSelect, file CC_DB2DBStatement.cpp, line 1,958)


Things done so far:
1) kept the debugging log and got this exception before the communication error.

Lkp_ItemId_Advice,0: PXBridgeOp::runLocally *not* CC_Exception::CC_X_DAAPI_ROW_ERROR
PXBridgeOp::runLocally CC_Exception 4

2) Query timeout is set to 0 as suggested by IBM through a PMR raised by us. (db2 update cli cfg for section common using QUERYTIMEOUTINTERVAL 0 )

3) Set the Keep alive time out to 20000(in the file db2dsdriver.cfg)

4) Set Keep conductor connection alive to No (in the DB2 connector properties).


Nothing worked so far and the job fails atleast once in a day. I am working on this for more than a month and none of the communication errors relating to datastage with IBM is helping.
Your suggestion would be a great help as i want to resolve this as soon as possible
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Is this the only job of this type that is running overnight? Or do you have other MQ jobs that don't have these symptoms? I'm just trying to determine if it could be something external to the job that is disconnecting the session.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
m_mani87
Participant
Posts: 45
Joined: Thu May 24, 2012 11:13 pm
Location: Coimbatore

Post by m_mani87 »

we have about 3 jobs all of the are WAVE where we have this problem.
Normal parallel jobs doesnt have this issue but they also have MQ
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

If all the Wave jobs are having the issue then it might be something external to DataStage that is disrupting the connection. Do you have a network team that could assist by providing some monitoring?

Do all the jobs abort at once? Can you restart them all at the same time to see if they all abort after roughly the same amount of time? If the first occurs, then it might be some sort of network / firewall blip that is resetting the connection. If it is the latter, then its a timeout setting on something.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
m_mani87
Participant
Posts: 45
Joined: Thu May 24, 2012 11:13 pm
Location: Coimbatore

Post by m_mani87 »

I suspect a load balancer which we have in our network causing the problem of resetting the connection.

Is there a way to keep the wave job from loosing the connection from the Datastage level.

Also if this alone wont help.Can it be done in

1) Sever level
2) Db level
3) Datastage level.
Post Reply