Communication Error

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
JezT
Participant
Posts: 75
Joined: Wed Sep 15, 2004 1:52 am

Communication Error

Post by JezT »

I am running a Job in DS using a flat file as the source file containing around 8 million rows.

One of the fields is used as a Natural Key passed into a Shared Container to do a lookup and retrieve the corresponding Surrogate Key.

When I run the job, the 8 million rows run through the initial transformer which passes the Natural Key into the Shared container. The rows are then passed through a tmp file before moving to a 2nd transformer stage where the Surrogate key is passed into. The problem seems to be that once around 4 million rows have been passed through the container, the following error occurs.
UKWHBMISCAxxInbGLMTfrm1Job..ARRG_ID_Lkup.SurrKey_Write_Tfm: [IBM][CLI Driver] SQL30081N A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "128.8.123.16". Communication function detecting the error: "recv". Protocol specific error code(s): "*", "*", "0". SQLSTATE=08001

SQLExecute: Error executing statement 'SELECT next value for DWHD019.DWH_ARRG_XREF_SURRID,-1 FROM SYSIBM.SYSDUMMY1'. See following DB2 message for details.
Do you think that I will need to stream my jobs to pass smaller volumes through or should DS be able to cope with 8 million rows at a time ?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You might be getting a timeout error in your job; DataStage really doesn't care about the number of rows. Is your "tmp" file a named pipe or a real disk file? About how long does this job run? What are your DB/2 timeouts?
JezT
Participant
Posts: 75
Joined: Wed Sep 15, 2004 1:52 am

Post by JezT »

ArndW wrote:You might be getting a timeout error in your job; DataStage really doesn't care about the number of rows. Is your "tmp" file a named pipe or a real disk file? About how long does this job run? What are your DB/2 timeouts?
Arnd

It is a real disk file and the job runs for about 50 mins. There is no further information in the log so doesn't actually state what the DB2 error is.

JezT
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Jezt,

try splitting you job in two; one to create the tmp file and the next to take that and load it into DB/2. What might be happening is that the connection to DB/2 is being opened, and that since no rows got written to it (up to 50 minutes in your case), by the time that the first attempted "write" is done the connection has timed out. You could also change the interim file to a named pipe or just remove the file write altogether temporarily to see if the job will then complete correctly...
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

It appears to be the SQL error from DB2. It can be because the DB2 cannot return the next sequence number from the list for reasons such as end-of-boundary defined for the sequence (max limit).

Can you pass a new source key which does not exist in the surrogate table and try this shared container.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

We might of had the same problem only we were using Oracle. I had to break 2 jobs up into multiple streams which also made it run faster. ASCL could not figure it out. We did not push the issue because it runs faster. Both jobs were at about 10 million rows before they failed. I think it is a memory leak.
Mamu Kim
Post Reply