Datastage configuration file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Datastage configuration file

Post by chandra.shekhar@tcs.com »

Dear All,

Need some information about datastage configuration file.

We have 2 DS+QS engine . Engine 1 containing 10 core(CPU) and engine 2 containing 9 core (CPU) with 40 gb ram on each server.

While running job what configuration file we have create so that we can take the advantages of processing of both server .

Basically my configuration sample file as follows:

{
node "node1"
{
fastname "Engine1"
pools ""
resource disk "/resource1" {pools ""}
resource scratchdisk "/scratch1" {pools ""}
}
node "node2"
{
fastname "Engine2"
pools ""
resource disk "/resource2" {pools ""}
resource scratchdisk "/scratch2" {pools ""}
}

}


Can anybody explain me configuration file wrt to cpu (10 +9 CPU)

Thanks.
Thanx and Regards,
ETL User
arvind_ds
Participant
Posts: 428
Joined: Thu Aug 16, 2007 11:38 pm
Location: Manali

Post by arvind_ds »

You can start with below.

No of nodes = Half the number of cores.
Arvind
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

arvind_ds wrote:You can start with below.

No of nodes = Half the number of cores.
Means that I can use following combination.

10 cpu =20 Nodes and 9 cpu =18 nodes.

So can I use 20 +18=38 nodes for loading data ? or It can hampered the performance while using 38 nodes. How can I get the information that how many nodes I have to use for a perticular job ?

Thanks
Thanx and Regards,
ETL User
chanaka
Premium Member
Premium Member
Posts: 96
Joined: Tue Sep 15, 2009 4:06 am
Location: United States

Post by chanaka »

That depends on the complexity of the jobs that you run. If you have the grid version of the InfoSphere then its taken care of by the resource manager. Else you have to use different configuration files based on the complexity of the job.

From your explanation it sounds like an SMP cluster. Checkout the link below. It may help you further.
http://publib.boulder.ibm.com/infocente ... n_SMP.html
Chanaka Wagoda
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

You also have to balance it out with how much data you are processing.

Is your job CPU bound or IO bound?

A bigger job (# of nodes) is not always going to be faster than a smaller one.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

chandra.shekhar@tcs.com wrote:
arvind_ds wrote:You can start with below.

No of nodes = Half the number of cores.
Means that I can use following combination.

10 cpu =20 Nodes and 9 cpu =18 nodes.
No it doesn't. It means
10 cpu = 5 nodes and 9 cpu = 5 nodes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply