Getting a "No space left on device" error

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Getting a "No space left on device" error

Post by abc123 »

I am getting this error despite there being plenty of space on my box. The 2 volumes involved, /dspublic and /dsresource, have 64GB and 131GB of space. My job design is:

dataset--> standardization-->Transform1-->Transform2-->Merge-->
Transform-->Dataset

The dataset has 10million+ rows but aborts at 8.3million.

Any ideas?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Have you monitored /tmp (and the other drives) while the job is running to see if any grows to 100% and, after the jobs abort, shrinks rapidly again?
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

No, I haven't. I'll try that now. The specific error message I get is:

APT_CombinedOperatorController(0),0: Unsupported close in APT_FileBufferOutput::spillToNextFile(): No space left on device.

By the way, let's say, during the run, my /tmp is becoming completely full and the only way to get around the problem is by deleting stuff from /tmp. Wouldn't that affect my data?
ds_user78
Participant
Posts: 23
Joined: Thu Nov 11, 2004 5:39 pm

Post by ds_user78 »

Check the scratch disk and resource disk space - you shd get this in the config file. Ensure that enough space is there in these paths.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

By the way, I am working in a grid environment.
ds_user78
Participant
Posts: 23
Joined: Thu Nov 11, 2004 5:39 pm

Post by ds_user78 »

Even in the grid env , you should be able to see the resource disk and resource scratchdisk on the dynamic configuration file from the director log. ensure that space is there on those paths.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

My scratch disk is empty but the temp space has 9.5 GB of stuff. Not a whole lot but can be cleared. I'll run the job and monitor the temp space during the run.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

I am running the job now. The job is into about 3 million rows and I see no difference in the scratch, resource and temp disk spaces. I had taken a snapshot in the beginning.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

My job aborted again. I monitored throughout. It apparently had no effect on scratch, resource or temp space.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

So which disk filled? Looks like you need to monitor all file systems.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

Thanks Ray for your response.

What's interesting is, I monitored both scratch, resource and temp spaces throughout the run before the abort. All had no space change. The problem was being caused by the merge stage in my job. I split the job into two jobs. The first job writes all the inputs of the merge to disk. The next job loads these files and continues with the merge. Strangely enough, this worked.

How do I monitor specific file systems? I looked into the default config file and monitored the drives mentioned there. Since I am in a grid, should I monitor some other file systems?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes you should monitor ALL file systems (use df -k command repeatedly) because you don't know which one is filling.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Since you are running in a grid environment, even you issue a df -k command, you will still only see the /dspbulic and /dsresource information on the Conductor node. No job is allowed to run on the Conductor node that's why you saw "NOTHING" changed when you were monitoring the job from that machine. You need to have root permission to isssue a command, qstat, from the PBSPro directory which will give you every piece information of your job, queue, and server.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Good catch.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

Leo, as you already know, I don't have access so I am out of luck. There are files in /dsresource directory which I want delete to clear space but once again, I don't have rights.
Post Reply