Creating new configuration file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
subha saravanan
Participant
Posts: 2
Joined: Fri Dec 01, 2006 2:47 am

Creating new configuration file

Post by subha saravanan »

Hi All,
Before creating a configuration file how to decide the number of nodes,pools and disk space.. what are the design considerations..

Regards,
Subha
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard. :D

The default configuration file (default.apt) comprises two nodes, each using disk resource within the DataStage Engine directory - because the install script can be sure that this location exists.

Your main considerations will be the volume of data to be processed and the available hardware resources - CPUs, memory and disk space.

You will create more than one configuration file, because not all jobs will require the full degree of parallelism of which your system is capable. But in the development environment you only need a two-node configuration file, since if it runs on two it will run on 2000.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
subha saravanan
Participant
Posts: 2
Joined: Fri Dec 01, 2006 2:47 am

Post by subha saravanan »

thanks for your quick reply.. I would like to know how to decide the number of nodes for optimized parallelism.. for optimised parallelism what are the design considerations for creating configuration file


[quote="ray.wurlod"]Welcome aboard. :D

The default configuration file (default.apt) comprises two nodes, each using disk resource within the DataStage Engine directory - because the install script can be sure that t ...[/quote]
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Create several configuration files, from 1 node through to the number of CPUs you have on your system. Make the configuration file a parameter to your job and measure performance, starting with 1 node and working your way up. In many cases the 1-node may give you the best performance.
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Post by pavankvk »

as a general principle, number of nodes shud be half of the number of processors for a SMP system. we were recomended this by ascential
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

"Optimized" varies on a job by job basis, and indeed even on a run by run basis. There is no such thing as a "one size fits all" configuration file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

ray.wurlod wrote:"Optimized" varies on a job by job basis, and indeed even on a run by run basis. There is no such thing as a "one size fits all" configuration file.
Great answer!. That's why you can have a customized configuration file assigned to particular job based on the design of your job. Am I right, Ray?
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Partly. It's why best practice is always to set up $APT_CONFIG_FILE as a job parameter so that you can run a job using different configuration files, depending (for example) on the volume of data to be processed. For example, a retail DW might ordinarily use ten nodes, but during the post-Xmas sales have much more data, so run using sixteen nodes. But for a job that pre-loads Lookup File Sets, maybe one or two nodes suffices even at the busiest of times.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

Nice example. So, volume of data plays a vital role as well. Thanks! Ray.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
Post Reply