Page 1 of 1

Job aborting when running with large data sets

Posted: Tue Aug 01, 2006 10:55 am
by sri1dhar
The job reads from a DataSet and based on a constaint writes to one of the 9 Sybase Tables. I am using Sybase OC stage. Job runs fine when the source data set has about 100,000 rows. But aborts with the below errors when ran using a dataset with 470,000 rows.

TransUpsert,1: Unable to wait for job to finish - 81002.
TransUpsert,1: the runLocally() of operator [DSJobRun in TransUpsert], partition 1 of 2, processID 743 on node2 failed.

Posted: Tue Aug 01, 2006 10:24 pm
by kumar_s
Have you checked for the disck space for all the nodes where the dataset lands. But still, it shouldnt give you this inormation. :?

Posted: Wed Aug 02, 2006 1:40 am
by ArndW
What stage is "TransUpsert"? If it is the Sybase OC stage then you should, in addition to what kumar has already suggested in checking your scratch space, check your DB logs for unexpected errors.

Posted: Wed Aug 02, 2006 7:34 am
by sri1dhar
There is no problem with the diskspace, both scratch and resource. TransUpsert is a Transformer stage.

When I replace Sybase OC stages with copy stage or dataset it works fine.

Posted: Wed Aug 02, 2006 7:57 am
by ArndW
PX will buffer between stages (unlike server, where pure pipes are used) and can, given the right conditions, overflow this buffered data to disk. Have you actually monitored your /tmp and scratch areas while the job was running to make sure that they aren't filling up? Also, how many nodes configuration file are you using and does the error change or go away when you change down to a single node configuration file?

Posted: Wed Aug 02, 2006 4:48 pm
by clickart
This looks like a UNIX process time-out error. We used to get a similar error when the DataStage job continues running more than the time-out limit set in UNIX.

Posted: Wed Aug 02, 2006 9:42 pm
by kumar_s
clickart wrote:This looks like a UNIX process time-out error. We used to get a similar error when the DataStage job continues running more than the time-out limit set in UNIX.
Could you elaborate more on this. May I know what is the parameter that helps in unix to extapolate the time-out limit for a process?

Posted: Thu Aug 03, 2006 3:15 pm
by clickart
I'm sorry. Since we had encountered this issue several months back, I couldnt recollect it correctly earlier.
The time-out value was actually set in DS Administrator. The "inactivity timeout" value was initially set low which caused the UNIX process to abort. However the DataStage job was running fine.
When we increased the inactivity timeout value, we didn't encounter this issue.
To be exact, this was on an AIX environment.
Hope this info helps.