Data stage Configuration File

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pankajg
Participant
Posts: 39
Joined: Mon Jun 05, 2006 5:24 am
Location: India

Data stage Configuration File

Post by pankajg »

I am trying to understand a few things related to the configuration file.

SAMPLE:
{
node "node1"
{
fastname "my_fastname"
pools ""
resource disk "/fs_stage_dataset/Datasets" {pools ""}
resource scratchdisk "/fs_stage_dataset/Scratch" {pools ""}
}
}

I am trying to read this file as follows:

My default pool "" has the following resources available to it:

Disk which would be using the space at "/fs_stage_dataset/Datasets"
Scratch Disk which would be using the space at "/fs_stage_dataset/Scratch"

And all processing on node1 would use the resources made available by the default pool.


javascript:emoticon(':x')
Mad

I would like to clarify what exactly is a node pool and a disk pool.
What is the default pool ""???

How are nodes associated to the pools?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Nodes are grouped into pools in the configuration file. It is pointless to do so, however, with a single node configuration. The same applies to disk pools. A node pool is a subset of the possible nodes in a configuration; it makes more sense to use node pools in an MPP environment.

Stages can be restricted to using particular node pools and/or particular disk pools by specifying those pool names in the stage properties.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pankajg
Participant
Posts: 39
Joined: Mon Jun 05, 2006 5:24 am
Location: India

Post by pankajg »

What would be the case when I have defined say 8 nodes, with no pools defined. As in I would only have one pool which is the default pool.

How would it affect the DS jobs execution?

Currently, my cofiguration is set to 8 nodes on a 16 CPU system (SMP), it has a SAN and all my disk and scratch space is set to fs_stage_dataset/Dataset or fs_stage_dataset/Scratch.

Node pools = "" (default)
Scratch pools = "" (default)
Disk pools = "" (default)

We have issues like SIGKILL, SIGBUS and these are due to resource constraints. One of the frequent errors that we have is the disk space full, scratch space full. Other are heap overflow and like.. I believe this could be set correct by modifying the configuration file.

Any suggesstions or pointers in tuning the config file???

Also please note: I am not a premium poster, so I cannot see most of the answer hiding behind the premium content. Is there a way I can look at it.? believe not, but request you to pls post them as a message to me, if you would.

Thanks a tonne.
Kirtikumar
Participant
Posts: 437
Joined: Fri Oct 15, 2004 6:13 am
Location: Pune, India

Post by Kirtikumar »

Instead of having 8 nodes directly, why dont you try to run the job with 1 node initially?

Once it finished increase the no. of nodes gradually.

The errors like - disk is full are thrown when the job is using any of the resource or scratch disk and there is not enough space for files.
Resource disks are used while creating Datasets and scratch disks are used during the Aggregate and sort operations.
Check the space available for directories mentioned as resource disk and scratch disk. You can use command df for this.

The total number of nodes to be used depend on many things like limits set for the user who is trying to run the job. E.g. if memory limit for user is 16 MB and job is trying to use more than that, then you will see this resource limit errors.

Check what are the limits for the user with which you are trying to run the job.
Regards,
S. Kirtikumar.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

:idea: Use file systems with more space available.

For less than $1/week you can buy the right to read premium posts.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
avi21st
Charter Member
Charter Member
Posts: 135
Joined: Thu May 26, 2005 10:21 am
Location: USA

Re: Data stage Configuration File

Post by avi21st »

pankajg wrote:I am trying to understand a few things related to the configuration file.

SAMPLE:
{
node "node1"
{
fastname "my_fastname"
pools ""
resource disk "/fs_stage_dataset/Datasets" {pools ""}
resource scratchdisk "/fs_stage_dataset/Scratch" {pools ""}
}
}

I am trying to read this file as follows:

My default pool "" has the following resources available to it:

Disk which would be using the space at "/fs_stage_dataset/Datasets"
Scratch Disk which would be using the space at "/fs_stage_dataset/Scratch"

And all processing on node1 would use the resources made available by the default pool.


javascript:emoticon(':x')
Mad

I would like to clarify what exactly is a node pool and a disk pool.
What is the default pool ""???

How are nodes associated to the pools?
Check one of the post in viewtopic.php?p=188113&highlight=#188113

This might throw some light on Config file design
Avishek Mukherjee
Data Integration Architect
Chicago, IL, USA.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Enrol in a training class. Read about configuration files in the Manager Guide.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jgreve
Premium Member
Premium Member
Posts: 107
Joined: Mon Sep 25, 2006 4:25 pm

ok, so how about posting some background info?

Post by jgreve »

pankajg wrote: Currently, my cofiguration is set to 8 nodes on a 16 CPU
system (SMP), it has a SAN and all my disk and scratch
space is set to fs_stage_dataset/Dataset
or fs_stage_dataset/Scratch.

We have issues like SIGKILL, SIGBUS and these are
due to resource constraints.

One of the frequent errors that we have is the
disk space full,
scratch space full.

Other are heap overflow and like..
I'm new to this config stuff, but...
It would help if you post your 8-node config file.
How many jobs do you have?
How many jobs are you running at the same time?
Do any of your jobs actually run to completion or do they
all blowing up on resource-issues?

I guess for one of the blowing-up jobs,
I'd like to see your OSH-score (via APT_DUMP_SCORE).

For starters,
I'd like to verify that your node-names are all different.
I'm sure the "fastname" is the same for each node.
But if the node-names are the same, you're running 8 players
on one cpu (at least I think that is how it works).

You say you're hitting heap overflows; just how much ram
do each of your CPU's have?
And what OS & version are you running on the SMP box?

r.e. disk-full, it that means it is pretty much
time to buy more disk.
Or... go delete some stuff.
What are your disk stats, anyway? total, %used,
that sort of thing.
I believe this could be set correct by
modifying the configuration file.
Well, I certainly admire your optimism :)
I am a tiny bit skeptical that config-file
changes will fix 100% of your headaches.
But perhaps you can convince me.

What kinds of jobs are you running?
What else is running on your SMP box?
Is is ONLY datastage? If yes, how do you know
that it is only running datastage?

In the parallel-env vars, you might try this:
APT_THIN_SCORE: This var can reduce parallel job memory
handling (only useful if you're exhausting real memory,
or need extra memory for sorting / buffering).

Good luck,
John G.
pankajg
Participant
Posts: 39
Joined: Mon Jun 05, 2006 5:24 am
Location: India

Post by pankajg »

Thank you all for your replies.
Failures push you towards Success.
Post Reply