Communication link failure

Precious · Post by **Precious** » Sat Nov 19, 2005 6:04 am

Hi all,

The job started at 2:08:56 PM, ran till 4:39:20 PM and then aborted on the following errors:

CUSTOMER_FACT_0000_HASH..TFM1: [Microsoft][ODBC SQL Server Driver][TCP/IP Sockets]ConnectionRead (recv()).

CUSTOMER_FACT_0000_HASH..TFM1: [Microsoft][ODBC SQL Server Driver][TCP/IP Sockets]General network error. Check your network documentation.

CUSTOMER_FACT_0000_HASH..TFM1: [Microsoft][ODBC SQL Server Driver][TCP/IP Sockets]ConnectionRead (recv()).

CUSTOMER_FACT_0000_HASH..TFM1: [Microsoft][ODBC SQL Server Driver][TCP/IP Sockets]General network error. Check your network documentation.

CUSTOMER_FACT_0000_HASH..TFM1: SQLFetch: Error retrieving results from server. 

CUSTOMER_FACT_0000_HASH..TFM1: [Microsoft][ODBC SQL Server Driver]Communication link failure
SQLTransact: Error committing transaction.

OS Client - Windows 2000 Terminal
OS Server - Windows Server 2003 DataCentre Edition
Database - MS SQL Server 2000

The job reads from a MSSQL Server database and writes to a hashfile.
A DRS Stage is being used to read from the database with the array size set at 1,000,000.
This job has run successfully once before, writing out +13,000,000 rows to the hashfile.

Is this a DataStage error, or is it a DataBase error? The DataStage engine and the DataBase server are sitting on the same box.
This was not the only job involved. The other jobs are setup on more or less the same principle. The all failed at different times, but with the same message.

Thanx,

ArndW · Post by **ArndW** » Sat Nov 19, 2005 7:38 am

It seems that something timed out, either the TCP/IP connection or the database commit.

1. What is your commit frequency?
2. Is the problem reproduceable either with number of rows processed within the same job or with a similar runtime for the same job? This might help to see if it is the DB or not.

kcbland · Post by **kcbland** » Sat Nov 19, 2005 8:53 am

The job ran for 2 1/2 hours trying to pull out 13 million rows. Doesn't this seem to be an excessive task? This means for 2 1/2 hours you have an open cursor on 13 million source rows holding all of that data in rollback. This is bad. You need to NOT have that data held that long, I don't care what database you are using. Get the data out faster by writing to a Sequential text file. You'll see that the job flies when not hashing all of that data. Use multiple job instances simultaneously to get different portions of that data into separate text files. You'll find that you have nX faster results because you have more data getting out of the database simultaneously.

Now, in another job use the Sequential file stage and in a filter or before transformer/job concatentate all of the text files together. Stream the output into the hash file. Make sure your hash file is under the 2.2 GB limit, otherwise create it as 64BIT. No matter what, set the minimum modulos high enough so that you're not resizing a lot.

ray.wurlod · Post by **ray.wurlod** » Sat Nov 19, 2005 1:12 pm

It may be sufficient just to concatenate the files together using an operating system command (such as type in Windows or cat in UNIX).
Any reason you can't use the SQL Server bulk loader? If you can, then these text files could serve as the dat files, with no need for additional processing, hashed files, temporary tables, hold areas, etc.

roy · Post by **roy** » Sun Nov 20, 2005 1:49 am

Hi,
As forthe original question.
This usually is caused by 1 of 2 things:
1. A real network problem.
2. You overloaded your machine to the point it couldn't handle the load wothout reaching the timeout.

If this persists due to increase in work load or specific additional work that was running on that specific occasion then your probably dealing with the 2nd case.

Having that said what our experts already said should help you avoid reaching this situation in the first place.

If you can idefntify specifc combinations of jobs that can't run simultaniously due to such reasons you might want to design mechnisms to inforce this (such as the DS internal semaphore mechnism), but this is another topic.

IHTH,

Precious · Post by **Precious** » Sun Nov 20, 2005 5:45 am

Thanks for the suggestions, will give it a go.

Kind regards,

woochuli · Post by **woochuli** » Tue Dec 06, 2005 10:16 pm

hi..

I found this troubleshoot at DataDirect Home Page.
(Home->Support->TroubleShooting->Knowlegebase)

Search "general network error"

=====================================
DataDirect KnowledgeBase
Unable to connect to SQL Server named instance from UNIX

Document Number: 2458483PF
Defect Number: DEF0001667,DEF0002070
Category: Patch
Created: 02/02/2005
Last Modified: 11/29/2005

Product Information
Product: Connect for ODBC; Connect64 for ODBC
Version: 5.0.00; 5.1.00
Operating System: AIX; HP UX; Linux; Solaris; HP UX IPF; Linux x64
Database: SQL Server
Document Details

When you attempt to connect to a SQL Server named instance from a UNIX machine using the syntax <IP address or Host Name>\<instance name>, you get the following errors:

SQLSTATE = 08001
NATIVE ERROR = 11
MSG = [DataDirect][ODBC SQL Server Driver][libssclient20]General network error. Check your network documentation.

SQLSTATE = 01000
NATIVE ERROR = 11
MSG = [DataDirect][ODBC SQL Server Driver][libssclient20]ConnectionOpen (10.30.11.24\SS2000()).

SQLSTATE = 01000
NATIVE ERROR = 0
MSG = [DataDirect][ODBC SQL Server Driver]The requested instance is either invalid or not running

However, you are able to connect to the named instance using the syntax <IP address or Host Name>,<Port>.

Defect DEF0001667 was filed and fixed to resolve this issue in Connect for ODBC 5.0. Download the appropriate platform patch to apply the fix.

To workaround the problem,

1) Specify the Named Instance's port number as referenced in KB doc 2470133DH.
or

2) Remove Named Pipes from the enabled protocols for the Named Instance by opening Sql Servers Network Server Utility, which is only on the server's machine. Then choose the Named Instance from the Instance drop down list. Disable all protocols except TCP/IP.

=========================================

Precious · Post by **Precious** » Tue Dec 06, 2005 11:59 pm

Thanks,

DSXchange

Communication link failure

Communication link failure

Re: Communication link failure