Job aborting when running with large data sets

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sri1dhar
Charter Member
Charter Member
Posts: 54
Joined: Mon Nov 03, 2003 3:57 pm

Job aborting when running with large data sets

Post by sri1dhar »

The job reads from a DataSet and based on a constaint writes to one of the 9 Sybase Tables. I am using Sybase OC stage. Job runs fine when the source data set has about 100,000 rows. But aborts with the below errors when ran using a dataset with 470,000 rows.

TransUpsert,1: Unable to wait for job to finish - 81002.
TransUpsert,1: the runLocally() of operator [DSJobRun in TransUpsert], partition 1 of 2, processID 743 on node2 failed.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Have you checked for the disck space for all the nodes where the dataset lands. But still, it shouldnt give you this inormation. :?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What stage is "TransUpsert"? If it is the Sybase OC stage then you should, in addition to what kumar has already suggested in checking your scratch space, check your DB logs for unexpected errors.
sri1dhar
Charter Member
Charter Member
Posts: 54
Joined: Mon Nov 03, 2003 3:57 pm

Post by sri1dhar »

There is no problem with the diskspace, both scratch and resource. TransUpsert is a Transformer stage.

When I replace Sybase OC stages with copy stage or dataset it works fine.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

PX will buffer between stages (unlike server, where pure pipes are used) and can, given the right conditions, overflow this buffered data to disk. Have you actually monitored your /tmp and scratch areas while the job was running to make sure that they aren't filling up? Also, how many nodes configuration file are you using and does the error change or go away when you change down to a single node configuration file?
clickart
Premium Member
Premium Member
Posts: 15
Joined: Tue Oct 18, 2005 10:14 pm
Location: Schaumburg, IL

Post by clickart »

This looks like a UNIX process time-out error. We used to get a similar error when the DataStage job continues running more than the time-out limit set in UNIX.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

clickart wrote:This looks like a UNIX process time-out error. We used to get a similar error when the DataStage job continues running more than the time-out limit set in UNIX.
Could you elaborate more on this. May I know what is the parameter that helps in unix to extapolate the time-out limit for a process?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
clickart
Premium Member
Premium Member
Posts: 15
Joined: Tue Oct 18, 2005 10:14 pm
Location: Schaumburg, IL

Post by clickart »

I'm sorry. Since we had encountered this issue several months back, I couldnt recollect it correctly earlier.
The time-out value was actually set in DS Administrator. The "inactivity timeout" value was initially set low which caused the UNIX process to abort. However the DataStage job was running fine.
When we increased the inactivity timeout value, we didn't encounter this issue.
To be exact, this was on an AIX environment.
Hope this info helps.
Post Reply