Job aborting when running with large data sets

sri1dhar · Post by **sri1dhar** » Tue Aug 01, 2006 10:55 am

The job reads from a DataSet and based on a constaint writes to one of the 9 Sybase Tables. I am using Sybase OC stage. Job runs fine when the source data set has about 100,000 rows. But aborts with the below errors when ran using a dataset with 470,000 rows.

TransUpsert,1: Unable to wait for job to finish - 81002.
TransUpsert,1: the runLocally() of operator [DSJobRun in TransUpsert], partition 1 of 2, processID 743 on node2 failed.

kumar_s · Post by **kumar_s** » Tue Aug 01, 2006 10:24 pm

Have you checked for the disck space for all the nodes where the dataset lands. But still, it shouldnt give you this inormation.

ArndW · Post by **ArndW** » Wed Aug 02, 2006 1:40 am

What stage is "TransUpsert"? If it is the Sybase OC stage then you should, in addition to what kumar has already suggested in checking your scratch space, check your DB logs for unexpected errors.

sri1dhar · Post by **sri1dhar** » Wed Aug 02, 2006 7:34 am

There is no problem with the diskspace, both scratch and resource. TransUpsert is a Transformer stage.

When I replace Sybase OC stages with copy stage or dataset it works fine.

ArndW · Post by **ArndW** » Wed Aug 02, 2006 7:57 am

PX will buffer between stages (unlike server, where pure pipes are used) and can, given the right conditions, overflow this buffered data to disk. Have you actually monitored your /tmp and scratch areas while the job was running to make sure that they aren't filling up? Also, how many nodes configuration file are you using and does the error change or go away when you change down to a single node configuration file?

clickart · Post by **clickart** » Wed Aug 02, 2006 4:48 pm

This looks like a UNIX process time-out error. We used to get a similar error when the DataStage job continues running more than the time-out limit set in UNIX.

kumar_s · Post by **kumar_s** » Wed Aug 02, 2006 9:42 pm

clickart wrote:This looks like a UNIX process time-out error. We used to get a similar error when the DataStage job continues running more than the time-out limit set in UNIX.

Could you elaborate more on this. May I know what is the parameter that helps in unix to extapolate the time-out limit for a process?

clickart · Post by **clickart** » Thu Aug 03, 2006 3:15 pm

I'm sorry. Since we had encountered this issue several months back, I couldnt recollect it correctly earlier.
The time-out value was actually set in DS Administrator. The "inactivity timeout" value was initially set low which caused the UNIX process to abort. However the DataStage job was running fine.
When we increased the inactivity timeout value, we didn't encounter this issue.
To be exact, this was on an AIX environment.
Hope this info helps.