Communication link failure

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Precious
Charter Member
Charter Member
Posts: 53
Joined: Mon Aug 23, 2004 9:51 am
Location: South Africa
Contact:

Communication link failure

Post by Precious »

Hi all,

The job started at 2:08:56 PM, ran till 4:39:20 PM and then aborted on the following errors:

Code: Select all

CUSTOMER_FACT_0000_HASH..TFM1: [Microsoft][ODBC SQL Server Driver][TCP/IP Sockets]ConnectionRead (recv()).

CUSTOMER_FACT_0000_HASH..TFM1: [Microsoft][ODBC SQL Server Driver][TCP/IP Sockets]General network error. Check your network documentation.

CUSTOMER_FACT_0000_HASH..TFM1: [Microsoft][ODBC SQL Server Driver][TCP/IP Sockets]ConnectionRead (recv()).

CUSTOMER_FACT_0000_HASH..TFM1: [Microsoft][ODBC SQL Server Driver][TCP/IP Sockets]General network error. Check your network documentation.

CUSTOMER_FACT_0000_HASH..TFM1: SQLFetch: Error retrieving results from server. 

CUSTOMER_FACT_0000_HASH..TFM1: [Microsoft][ODBC SQL Server Driver]Communication link failure
SQLTransact: Error committing transaction. 
OS Client - Windows 2000 Terminal
OS Server - Windows Server 2003 DataCentre Edition
Database - MS SQL Server 2000


The job reads from a MSSQL Server database and writes to a hashfile.
A DRS Stage is being used to read from the database with the array size set at 1,000,000.
This job has run successfully once before, writing out +13,000,000 rows to the hashfile.

:?: Is this a DataStage error, or is it a DataBase error? The DataStage engine and the DataBase server are sitting on the same box.
This was not the only job involved. The other jobs are setup on more or less the same principle. The all failed at different times, but with the same message.

Thanx,
Precious

Mosher's Law of Software Engineering: Don't worry if it doesn't work right. If everything did, you'd be out of a job.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

It seems that something timed out, either the TCP/IP connection or the database commit.

1. What is your commit frequency?
2. Is the problem reproduceable either with number of rows processed within the same job or with a similar runtime for the same job? This might help to see if it is the DB or not.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The job ran for 2 1/2 hours trying to pull out 13 million rows. Doesn't this seem to be an excessive task? This means for 2 1/2 hours you have an open cursor on 13 million source rows holding all of that data in rollback. This is bad. You need to NOT have that data held that long, I don't care what database you are using. Get the data out faster by writing to a Sequential text file. You'll see that the job flies when not hashing all of that data. Use multiple job instances simultaneously to get different portions of that data into separate text files. You'll find that you have nX faster results because you have more data getting out of the database simultaneously.

Now, in another job use the Sequential file stage and in a filter or before transformer/job concatentate all of the text files together. Stream the output into the hash file. Make sure your hash file is under the 2.2 GB limit, otherwise create it as 64BIT. No matter what, set the minimum modulos high enough so that you're not resizing a lot.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It may be sufficient just to concatenate the files together using an operating system command (such as type in Windows or cat in UNIX).
Any reason you can't use the SQL Server bulk loader? If you can, then these text files could serve as the dat files, with no need for additional processing, hashed files, temporary tables, hold areas, etc.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
As forthe original question.
This usually is caused by 1 of 2 things:
1. A real network problem.
2. You overloaded your machine to the point it couldn't handle the load wothout reaching the timeout.

If this persists due to increase in work load or specific additional work that was running on that specific occasion then your probably dealing with the 2nd case.

Having that said what our experts already said should help you avoid reaching this situation in the first place.

If you can idefntify specifc combinations of jobs that can't run simultaniously due to such reasons you might want to design mechnisms to inforce this (such as the DS internal semaphore mechnism), but this is another topic.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Precious
Charter Member
Charter Member
Posts: 53
Joined: Mon Aug 23, 2004 9:51 am
Location: South Africa
Contact:

Post by Precious »

Thanks for the suggestions, will give it a go.

Kind regards,
Precious

Mosher's Law of Software Engineering: Don't worry if it doesn't work right. If everything did, you'd be out of a job.
woochuli
Participant
Posts: 1
Joined: Mon Oct 04, 2004 10:45 pm
Location: Seoul, Republic of Korea
Contact:

Re: Communication link failure

Post by woochuli »

hi..

I found this troubleshoot at DataDirect Home Page.
(Home->Support->TroubleShooting->Knowlegebase)

Search "general network error"

=====================================
DataDirect KnowledgeBase
Unable to connect to SQL Server named instance from UNIX

Document Number: 2458483PF
Defect Number: DEF0001667,DEF0002070
Category: Patch
Created: 02/02/2005
Last Modified: 11/29/2005


Product Information
Product: Connect for ODBC; Connect64 for ODBC
Version: 5.0.00; 5.1.00
Operating System: AIX; HP UX; Linux; Solaris; HP UX IPF; Linux x64
Database: SQL Server
Document Details

When you attempt to connect to a SQL Server named instance from a UNIX machine using the syntax <IP address or Host Name>\<instance name>, you get the following errors:

SQLSTATE = 08001
NATIVE ERROR = 11
MSG = [DataDirect][ODBC SQL Server Driver][libssclient20]General network error. Check your network documentation.

SQLSTATE = 01000
NATIVE ERROR = 11
MSG = [DataDirect][ODBC SQL Server Driver][libssclient20]ConnectionOpen (10.30.11.24\SS2000()).

SQLSTATE = 01000
NATIVE ERROR = 0
MSG = [DataDirect][ODBC SQL Server Driver]The requested instance is either invalid or not running

However, you are able to connect to the named instance using the syntax <IP address or Host Name>,<Port>.



Defect DEF0001667 was filed and fixed to resolve this issue in Connect for ODBC 5.0. Download the appropriate platform patch to apply the fix.

To workaround the problem,

1) Specify the Named Instance's port number as referenced in KB doc 2470133DH.
or

2) Remove Named Pipes from the enabled protocols for the Named Instance by opening Sql Servers Network Server Utility, which is only on the server's machine. Then choose the Named Instance from the Instance drop down list. Disable all protocols except TCP/IP.

=========================================
National Cancer Center
Samsung Card
LG Electronic
Ghil Hospital
Precious
Charter Member
Charter Member
Posts: 53
Joined: Mon Aug 23, 2004 9:51 am
Location: South Africa
Contact:

Post by Precious »

Thanks,
Precious

Mosher's Law of Software Engineering: Don't worry if it doesn't work right. If everything did, you'd be out of a job.
Post Reply