Communication Error

JezT · Post by **JezT** » Mon Feb 21, 2005 4:32 am

I am running a Job in DS using a flat file as the source file containing around 8 million rows.

One of the fields is used as a Natural Key passed into a Shared Container to do a lookup and retrieve the corresponding Surrogate Key.

When I run the job, the 8 million rows run through the initial transformer which passes the Natural Key into the Shared container. The rows are then passed through a tmp file before moving to a 2nd transformer stage where the Surrogate key is passed into. The problem seems to be that once around 4 million rows have been passed through the container, the following error occurs.

UKWHBMISCAxxInbGLMTfrm1Job..ARRG_ID_Lkup.SurrKey_Write_Tfm: [IBM][CLI Driver] SQL30081N A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "128.8.123.16". Communication function detecting the error: "recv". Protocol specific error code(s): "*", "*", "0". SQLSTATE=08001

SQLExecute: Error executing statement 'SELECT next value for DWHD019.DWH_ARRG_XREF_SURRID,-1 FROM SYSIBM.SYSDUMMY1'. See following DB2 message for details.

Do you think that I will need to stream my jobs to pass smaller volumes through or should DS be able to cope with 8 million rows at a time ?

ArndW · Post by **ArndW** » Mon Feb 21, 2005 4:53 am

You might be getting a timeout error in your job; DataStage really doesn't care about the number of rows. Is your "tmp" file a named pipe or a real disk file? About how long does this job run? What are your DB/2 timeouts?

JezT · Post by **JezT** » Mon Feb 21, 2005 5:01 am

ArndW wrote:You might be getting a timeout error in your job; DataStage really doesn't care about the number of rows. Is your "tmp" file a named pipe or a real disk file? About how long does this job run? What are your DB/2 timeouts?

Arnd

It is a real disk file and the job runs for about 50 mins. There is no further information in the log so doesn't actually state what the DB2 error is.

JezT

ArndW · Post by **ArndW** » Mon Feb 21, 2005 5:05 am

Jezt,

try splitting you job in two; one to create the tmp file and the next to take that and load it into DB/2. What might be happening is that the connection to DB/2 is being opened, and that since no rows got written to it (up to 50 minutes in your case), by the time that the first attempted "write" is done the connection has timed out. You could also change the interim file to a named pipe or just remove the file write altogether temporarily to see if the job will then complete correctly...

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Mon Feb 21, 2005 6:06 am

It appears to be the SQL error from DB2. It can be because the DB2 cannot return the next sequence number from the list for reasons such as end-of-boundary defined for the sequence (max limit).

Can you pass a new source key which does not exist in the surrogate table and try this shared container.

kduke · Post by **kduke** » Mon Feb 21, 2005 8:12 am

We might of had the same problem only we were using Oracle. I had to break 2 jobs up into multiple streams which also made it run faster. ASCL could not figure it out. We did not push the issue because it runs faster. Both jobs were at about 10 million rows before they failed. I think it is a memory leak.