Creating new configuration file

subha saravanan · Post by **subha saravanan** » Tue Jan 23, 2007 1:22 am

Hi All,
Before creating a configuration file how to decide the number of nodes,pools and disk space.. what are the design considerations..

Regards,
Subha

ray.wurlod · Post by **ray.wurlod** » Tue Jan 23, 2007 2:38 am

Welcome aboard. :D

The default configuration file (default.apt) comprises two nodes, each using disk resource within the DataStage Engine directory - because the install script can be sure that this location exists.

Your main considerations will be the volume of data to be processed and the available hardware resources - CPUs, memory and disk space.

You will create more than one configuration file, because not all jobs will require the full degree of parallelism of which your system is capable. But in the development environment you only need a two-node configuration file, since if it runs on two it will run on 2000.

subha saravanan · Post by **subha saravanan** » Tue Jan 23, 2007 7:25 am

thanks for your quick reply.. I would like to know how to decide the number of nodes for optimized parallelism.. for optimised parallelism what are the design considerations for creating configuration file

[quote="ray.wurlod"]Welcome aboard. :D

The default configuration file (default.apt) comprises two nodes, each using disk resource within the DataStage Engine directory - because the install script can be sure that t ...[/quote]

ArndW · Post by **ArndW** » Tue Jan 23, 2007 9:09 am

Create several configuration files, from 1 node through to the number of CPUs you have on your system. Make the configuration file a parameter to your job and measure performance, starting with 1 node and working your way up. In many cases the 1-node may give you the best performance.

pavankvk · Post by **pavankvk** » Tue Jan 23, 2007 3:22 pm

as a general principle, number of nodes shud be half of the number of processors for a SMP system. we were recomended this by ascential

ray.wurlod · Post by **ray.wurlod** » Tue Jan 23, 2007 3:34 pm

"Optimized" varies on a job by job basis, and indeed even on a run by run basis. There is no such thing as a "one size fits all" configuration file.

I_Server_Whale · Post by **I_Server_Whale** » Tue Jan 23, 2007 3:37 pm

ray.wurlod wrote:"Optimized" varies on a job by job basis, and indeed even on a run by run basis. There is no such thing as a "one size fits all" configuration file.

Great answer!. That's why you can have a customized configuration file assigned to particular job based on the design of your job. Am I right, Ray?

ray.wurlod · Post by **ray.wurlod** » Tue Jan 23, 2007 3:48 pm

Partly. It's why best practice is always to set up $APT_CONFIG_FILE as a job parameter so that you can run a job using different configuration files, depending (for example) on the volume of data to be processed. For example, a retail DW might ordinarily use ten nodes, but during the post-Xmas sales have much more data, so run using sixteen nodes. But for a job that pre-loads Lookup File Sets, maybe one or two nodes suffices even at the busiest of times.

I_Server_Whale · Post by **I_Server_Whale** » Tue Jan 23, 2007 4:05 pm

Nice example. So, volume of data plays a vital role as well. Thanks! Ray.