Reg. Configuration File

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Reg. Configuration File

Post by chandra.shekhar@tcs.com »

Hi,
Can any body explain me the following code of the configuration file?
And what will be the difference when "DB2" word is used ??
I am bit slow in understanding the inner logic of the file. :oops:

Code: Select all

{
	node "node1_1"
	{
		fastname "brhaspati"
		pools ""
		resource disk "/resource1" {pools ""}
		resource scratchdisk "/scratch1" {pools ""}
	}
node "node1_2"
	{
		fastname "brhaspati"
		pools "DB2"
		resource disk "/resource1" {pools ""}
		resource scratchdisk "/scratch1" {pools ""}
	}
}
Thanx and Regards,
ETL User
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This configuration offers two nodes, only one of which is in the default node pool (the one with "" as its pool name). The other one is in a node pool called "DB2".

Non-DB2 stages will, unless specified otherwise, execute in the default node pool. In your configuration that means they will all run sequentially, since there's only one node in the pool.

DB2 stages will automatically seek out a node pool called "DB2" and execute in that. If there is no "DB2" node pool, they will also execute in the default node pool.

As far as I can see this configuration file is a misguided attempt to separate the DB2 processing from the other processing. The problem is that it has sacrificed all the benefits of parallelism to do so, without any gains in overall processing efficiency since all nodes are on the same machine.

If there were two or more processing (default) nodes, and maybe multiple nodes in the "DB2" node pool corresponding to the number of table partitions, then we might have a different story!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Thanx Ray.
You are correct, actually the file is having 12 defult nodes and 12 "DB2" nodes. Just to understand the logic I pasted only a part of the file's logic.
Then according to you, 12 default nodes will be assigned to non-DB2 stages and 12 nodes fot DB2 stages, am I right?
Now in this scenario, am I achieving parellelism?
Thanx and Regards,
ETL User
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

Yes you are running on parallel architecture with 12 nodes for processing stages and 12 nodes for DB2 stages
- Zulfi
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Thanx Zulfi.
So what do you think in which scenario a normal job will work faster -using 24 default nodes or 24 mixed(12 default and 12 DB2 nodes) ?
Consider a job

Code: Select all

 Seq File -->Tfr-->DB2 Connector 
Src is having around 100 million records.
Only Null/Valid Date check happens in the Tfr and that too for 10% columns..
What to you say?
Thanx and Regards,
ETL User
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

chandra.shekhar@tcs.com wrote: So what do you think in which scenario a normal job will work faster -using 24 default nodes or 24 mixed(12 default and 12 DB2 nodes) ?

What to you say?
To answer the above, there is a lot to say :wink:

Increasing the number of nodes on and on wont make your job run faster...... you need to understand your hardware to decide how far you can dwell in parallelism.

Adding too many nodes will increase the overhead of managing the numerous process.

If you are not sure of what lies under the hood then perform a trial and error run to find what is the best number of nodes for optimal performance (which in your case you define as speed of processing), beware that this node count would again depend on the varying load of on the server when the test is performed.
- Zulfi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

See if you can talk to a site that is using a comparably sized configuration, for example Target Corporation (they have offices in Minneapolis and Bangalore). One of their configurations has 10 processing nodes and 24 DB2 nodes (12 for reading, 12 for writing).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Thanx Zulfi and Ray for your responses.
I will test and let you know.
Thanx and Regards,
ETL User
Post Reply