Page 1 of 1

Data stage Configuration File

Posted: Tue Oct 03, 2006 10:51 pm
by pankajg
I am trying to understand a few things related to the configuration file.

SAMPLE:
{
node "node1"
{
fastname "my_fastname"
pools ""
resource disk "/fs_stage_dataset/Datasets" {pools ""}
resource scratchdisk "/fs_stage_dataset/Scratch" {pools ""}
}
}

I am trying to read this file as follows:

My default pool "" has the following resources available to it:

Disk which would be using the space at "/fs_stage_dataset/Datasets"
Scratch Disk which would be using the space at "/fs_stage_dataset/Scratch"

And all processing on node1 would use the resources made available by the default pool.


javascript:emoticon(':x')
Mad

I would like to clarify what exactly is a node pool and a disk pool.
What is the default pool ""???

How are nodes associated to the pools?

Posted: Tue Oct 03, 2006 11:43 pm
by ray.wurlod
Nodes are grouped into pools in the configuration file. It is pointless to do so, however, with a single node configuration. The same applies to disk pools. A node pool is a subset of the possible nodes in a configuration; it makes more sense to use node pools in an MPP environment.

Stages can be restricted to using particular node pools and/or particular disk pools by specifying those pool names in the stage properties.

Posted: Wed Oct 04, 2006 12:54 am
by pankajg
What would be the case when I have defined say 8 nodes, with no pools defined. As in I would only have one pool which is the default pool.

How would it affect the DS jobs execution?

Currently, my cofiguration is set to 8 nodes on a 16 CPU system (SMP), it has a SAN and all my disk and scratch space is set to fs_stage_dataset/Dataset or fs_stage_dataset/Scratch.

Node pools = "" (default)
Scratch pools = "" (default)
Disk pools = "" (default)

We have issues like SIGKILL, SIGBUS and these are due to resource constraints. One of the frequent errors that we have is the disk space full, scratch space full. Other are heap overflow and like.. I believe this could be set correct by modifying the configuration file.

Any suggesstions or pointers in tuning the config file???

Also please note: I am not a premium poster, so I cannot see most of the answer hiding behind the premium content. Is there a way I can look at it.? believe not, but request you to pls post them as a message to me, if you would.

Thanks a tonne.

Posted: Wed Oct 04, 2006 7:26 am
by Kirtikumar
Instead of having 8 nodes directly, why dont you try to run the job with 1 node initially?

Once it finished increase the no. of nodes gradually.

The errors like - disk is full are thrown when the job is using any of the resource or scratch disk and there is not enough space for files.
Resource disks are used while creating Datasets and scratch disks are used during the Aggregate and sort operations.
Check the space available for directories mentioned as resource disk and scratch disk. You can use command df for this.

The total number of nodes to be used depend on many things like limits set for the user who is trying to run the job. E.g. if memory limit for user is 16 MB and job is trying to use more than that, then you will see this resource limit errors.

Check what are the limits for the user with which you are trying to run the job.

Posted: Wed Oct 04, 2006 2:20 pm
by ray.wurlod
:idea: Use file systems with more space available.

For less than $1/week you can buy the right to read premium posts.

Re: Data stage Configuration File

Posted: Wed Oct 04, 2006 2:41 pm
by avi21st
pankajg wrote:I am trying to understand a few things related to the configuration file.

SAMPLE:
{
node "node1"
{
fastname "my_fastname"
pools ""
resource disk "/fs_stage_dataset/Datasets" {pools ""}
resource scratchdisk "/fs_stage_dataset/Scratch" {pools ""}
}
}

I am trying to read this file as follows:

My default pool "" has the following resources available to it:

Disk which would be using the space at "/fs_stage_dataset/Datasets"
Scratch Disk which would be using the space at "/fs_stage_dataset/Scratch"

And all processing on node1 would use the resources made available by the default pool.


javascript:emoticon(':x')
Mad

I would like to clarify what exactly is a node pool and a disk pool.
What is the default pool ""???

How are nodes associated to the pools?
Check one of the post in viewtopic.php?p=188113&highlight=#188113

This might throw some light on Config file design

Posted: Wed Oct 04, 2006 3:15 pm
by ray.wurlod
Enrol in a training class. Read about configuration files in the Manager Guide.

ok, so how about posting some background info?

Posted: Wed Oct 04, 2006 7:13 pm
by jgreve
pankajg wrote: Currently, my cofiguration is set to 8 nodes on a 16 CPU
system (SMP), it has a SAN and all my disk and scratch
space is set to fs_stage_dataset/Dataset
or fs_stage_dataset/Scratch.

We have issues like SIGKILL, SIGBUS and these are
due to resource constraints.

One of the frequent errors that we have is the
disk space full,
scratch space full.

Other are heap overflow and like..
I'm new to this config stuff, but...
It would help if you post your 8-node config file.
How many jobs do you have?
How many jobs are you running at the same time?
Do any of your jobs actually run to completion or do they
all blowing up on resource-issues?

I guess for one of the blowing-up jobs,
I'd like to see your OSH-score (via APT_DUMP_SCORE).

For starters,
I'd like to verify that your node-names are all different.
I'm sure the "fastname" is the same for each node.
But if the node-names are the same, you're running 8 players
on one cpu (at least I think that is how it works).

You say you're hitting heap overflows; just how much ram
do each of your CPU's have?
And what OS & version are you running on the SMP box?

r.e. disk-full, it that means it is pretty much
time to buy more disk.
Or... go delete some stuff.
What are your disk stats, anyway? total, %used,
that sort of thing.
I believe this could be set correct by
modifying the configuration file.
Well, I certainly admire your optimism :)
I am a tiny bit skeptical that config-file
changes will fix 100% of your headaches.
But perhaps you can convince me.

What kinds of jobs are you running?
What else is running on your SMP box?
Is is ONLY datastage? If yes, how do you know
that it is only running datastage?

In the parallel-env vars, you might try this:
APT_THIN_SCORE: This var can reduce parallel job memory
handling (only useful if you're exhausting real memory,
or need extra memory for sorting / buffering).

Good luck,
John G.

Posted: Thu Oct 05, 2006 7:38 am
by pankajg
Thank you all for your replies.