How and where the scratch disk is defined

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
peep
Premium Member
Premium Member
Posts: 162
Joined: Mon Aug 20, 2012 6:52 pm

How and where the scratch disk is defined

Post by peep »

how and where the scratch disk is defined?

How are the parameters defined ?
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

The scratch filesystems used by your parallel jobs are defined within your parallel configuration files, as is well-documented in the Parallel Job Developer's Guidesection on Configuration Files.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:!: Please do not run around and find semi-related posts to hijack, just stick with this one (or your other related one) to get your questions answered. I removed this from the older post you found, thought it would work best here.

peep wrote:I am having the same issue. scratch disk space is full .

here is config

Code: Select all

{
	node "node1"
	{
		fastname "edrnpr17"
		pools ""
		resource disk "/IIS/data/node1/datasets" {pools ""}
		resource scratchdisk "/IIS/data/node1/sort" {pools "sort"}
		resource scratchdisk "/IIS/data/node1/buffer" {pools "buffer"}
	}
	node "node2"
	{
		fastname "edrnpr17"
		pools ""
		resource disk "/IIS/data/node2/datasets" {pools ""}
		resource scratchdisk "/IIS/data/node2/sort" {pools "sort"}
		resource scratchdisk "/IIS/data/node2/buffer" {pools "buffer"}
	}
	node "node3"
	{
		fastname "edrnpr17"
		pools ""
		resource disk "/IIS/data/node3/datasets" {pools ""}
		resource scratchdisk "/IIS/data/node3/sort" {pools "sort"}
		resource scratchdisk "/IIS/data/node3/buffer" {pools "buffer"}
	}
}
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Note that there is no default disk pool for scratchdisk in the configuration file that Craig posted. Perhaps there should be. It depends what DataStage is being asked to do.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
peep
Premium Member
Premium Member
Posts: 162
Joined: Mon Aug 20, 2012 6:52 pm

Post by peep »

Well we are working on xml files and when we run the job we hav this issue : scratch disk is full"
we hav 100gb disk ...
what can be the reason ?when i check it shows 50% free space.


pls pls help..
let me know ur suggestions...
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

"we are working on xml files" is extremely vague as to what your job is doing. You can work on xml files with a text editor :)

Take time to analyze and list out the following:

1) What operations does your job perform that will cause it to use your defined scratch disks?
2) What is the volume of data you are processing (not only rows of data, but amount/number of bytes)?
3) Are the three resources defined in each node simply different directories on the same disks or are they separate disks?
4) How many other jobs are running in your environment at the same time as your job?

Are you and/or the system administrators watching disk usage while your job is running? If not, you should be as that is when it is used. Maybe only one of the file systems (node1, node2 or node3) is running out of space.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
peep
Premium Member
Premium Member
Posts: 162
Joined: Mon Aug 20, 2012 6:52 pm

Post by peep »

1)Job has sort stage ,transformer ,aggregrator...xml stage as well.
2)Three nodes are defined in the same directory IIS/data/node1,data/node2, data/node 3 and are on same disk.
3) There is no specific space allocate to scratch disk. The data can expand till 100 gb. (so do i need to change any settings in the datastage clients ) that will let datastage jobs to use full disk space with out any restrictions ?
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

There is no specific space allocate to scratch disk. The data can expand till 100 gb.
Ahh...now some potentially useful information. Do the scratch and buffer directories reside on the same physical disks with the disk resources? Those being /IIS/data/node1/datasets, /IIS/data/node2/datasets and /IIS/data/node3/datasets. Is that 100GB shared by all of these file systems? If so, then you MUST consider the size of your output datasets as part of the problem...they will affect how much storage is available for scratch usage. Also, any other datasets that already exist from other jobs or job runs that reside on the same disk.

Where are your source XML files stored? On the same disk as datasets, sort and buffer scratch storage? If so, they also factor into how much space is available for scratch usage.

Standard/best practices recommend that scratch space be allocated to it's own disk storage when possible.

Sorts and buffers will use what space is available in the file systems listed within each logical node in your configuration file. Each logical node (i.e. job partition) will use the file systems allocated to it and not the file systems allocated to other nodes (if they are not the same names).

Sort will use (as documented) 1) sort disk pools, then 2) default disk pools (you have none defined), then 3) $TMPDIR, then 4) /tmp. Buffers will use (as documented)either 1) the default disk pools OR 2) buffer disk pools (only if defined, as in your case)

Do your error messages say anything more than "scratch space full"? Such as which scratch space (which file system)?

You're using three partitions. How evenly distributed is your data among the three partitions?

You need to work with your sys admins to monitor disk usage (at the directory level) while your job is running in order to determine what is consuming the majority of your disk space. You may simply find that you just don't have enough storage to hold everything (scratch files, datasets, etc.) at the same time on the same disk and need to add more storage.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
peep
Premium Member
Premium Member
Posts: 162
Joined: Mon Aug 20, 2012 6:52 pm

Post by peep »

hi All Thanks for your response.

Now I know where and how the scratch disk is defined.

in APT_CONFIG_FILE

we ran out of disk space as our buffer disk did not have enough space to support during run time.
we increased the space and that took care of it.

To add more information
now we have moved our scratch disk to nfs and added variables to support nfs.
Post Reply