How and where the scratch disk is defined

peep · Post by **peep** » Tue Aug 28, 2012 1:04 pm

how and where the scratch disk is defined?

How are the parameters defined ?

jwiles · Post by **jwiles** » Tue Aug 28, 2012 2:20 pm

The scratch filesystems used by your parallel jobs are defined within your parallel configuration files, as is well-documented in the Parallel Job Developer's Guidesection on Configuration Files.

Regards,

chulett · Post by **chulett** » Tue Aug 28, 2012 4:40 pm

Please do not run around and find semi-related posts to hijack, just stick with this one (or your other related one) to get your questions answered. I removed this from the older post you found, thought it would work best here.

peep wrote:I am having the same issue. scratch disk space is full .

here is config

Code: Select all

{
	node "node1"
	{
		fastname "edrnpr17"
		pools ""
		resource disk "/IIS/data/node1/datasets" {pools ""}
		resource scratchdisk "/IIS/data/node1/sort" {pools "sort"}
		resource scratchdisk "/IIS/data/node1/buffer" {pools "buffer"}
	}
	node "node2"
	{
		fastname "edrnpr17"
		pools ""
		resource disk "/IIS/data/node2/datasets" {pools ""}
		resource scratchdisk "/IIS/data/node2/sort" {pools "sort"}
		resource scratchdisk "/IIS/data/node2/buffer" {pools "buffer"}
	}
	node "node3"
	{
		fastname "edrnpr17"
		pools ""
		resource disk "/IIS/data/node3/datasets" {pools ""}
		resource scratchdisk "/IIS/data/node3/sort" {pools "sort"}
		resource scratchdisk "/IIS/data/node3/buffer" {pools "buffer"}
	}
}

ray.wurlod · Post by **ray.wurlod** » Tue Aug 28, 2012 5:24 pm

Note that there is no default disk pool for scratchdisk in the configuration file that Craig posted. Perhaps there should be. It depends what DataStage is being asked to do.

peep · Post by **peep** » Tue Aug 28, 2012 8:38 pm

Well we are working on xml files and when we run the job we hav this issue : scratch disk is full"
we hav 100gb disk ...
what can be the reason ?when i check it shows 50% free space.

pls pls help..
let me know ur suggestions...

jwiles · Post by **jwiles** » Tue Aug 28, 2012 9:02 pm

"we are working on xml files" is extremely vague as to what your job is doing. You can work on xml files with a text editor

Take time to analyze and list out the following:

1) What operations does your job perform that will cause it to use your defined scratch disks?
2) What is the volume of data you are processing (not only rows of data, but amount/number of bytes)?
3) Are the three resources defined in each node simply different directories on the same disks or are they separate disks?
4) How many other jobs are running in your environment at the same time as your job?

Are you and/or the system administrators watching disk usage while your job is running? If not, you should be as that is when it is used. Maybe only one of the file systems (node1, node2 or node3) is running out of space.

Regards,

peep · Post by **peep** » Tue Aug 28, 2012 9:19 pm

1)Job has sort stage ,transformer ,aggregrator...xml stage as well.
2)Three nodes are defined in the same directory IIS/data/node1,data/node2, data/node 3 and are on same disk.
3) There is no specific space allocate to scratch disk. The data can expand till 100 gb. (so do i need to change any settings in the datastage clients ) that will let datastage jobs to use full disk space with out any restrictions ?

jwiles · Post by **jwiles** » Wed Aug 29, 2012 9:12 am

There is no specific space allocate to scratch disk. The data can expand till 100 gb.

Ahh...now some potentially useful information. Do the scratch and buffer directories reside on the same physical disks with the disk resources? Those being /IIS/data/node1/datasets, /IIS/data/node2/datasets and /IIS/data/node3/datasets. Is that 100GB shared by all of these file systems? If so, then you MUST consider the size of your output datasets as part of the problem...they will affect how much storage is available for scratch usage. Also, any other datasets that already exist from other jobs or job runs that reside on the same disk.

Where are your source XML files stored? On the same disk as datasets, sort and buffer scratch storage? If so, they also factor into how much space is available for scratch usage.

Standard/best practices recommend that scratch space be allocated to it's own disk storage when possible.

Sorts and buffers will use what space is available in the file systems listed within each logical node in your configuration file. Each logical node (i.e. job partition) will use the file systems allocated to it and not the file systems allocated to other nodes (if they are not the same names).

Sort will use (as documented) 1) sort disk pools, then 2) default disk pools (you have none defined), then 3) $TMPDIR, then 4) /tmp. Buffers will use (as documented)either 1) the default disk pools OR 2) buffer disk pools (only if defined, as in your case)

Do your error messages say anything more than "scratch space full"? Such as which scratch space (which file system)?

You're using three partitions. How evenly distributed is your data among the three partitions?

You need to work with your sys admins to monitor disk usage (at the directory level) while your job is running in order to determine what is consuming the majority of your disk space. You may simply find that you just don't have enough storage to hold everything (scratch files, datasets, etc.) at the same time on the same disk and need to add more storage.

Regards,

peep · Post by **peep** » Thu Oct 10, 2013 5:47 pm

hi All Thanks for your response.

Now I know where and how the scratch disk is defined.

in APT_CONFIG_FILE

we ran out of disk space as our buffer disk did not have enough space to support during run time.
we increased the space and that took care of it.

To add more information
now we have moved our scratch disk to nfs and added variables to support nfs.