DSXchange

Posted: **Thu Aug 03, 2006 12:30 pm**

Came across this in the Advanced Developer's Guide:

"Parallel jobs with large memory requirements can benefit from parallelism if they act on data that has been partitioned and if the required memory is also divided among partitions"

I know that data can be partitioned by using the partitioning tab but how can you divide the required memory among partitions?

Thanks to anyone who responds.

Posted: **Thu Aug 03, 2006 2:15 pm**

Hi

These are some pointers- we need to design the configuarion file for dividing the data into partitions(depending on the logic) and using the concept of parallelism in a Datastage job. Also you need to designa nd allocate appropriate space for resource disk and scratch disk in your configuration file

The configuration file tells DataStage Enterprise Edition how to exploit underlying system resources (processing, temporary storage, and dataset storage). In more advanced environments, the configuration file can also define other resources such as databases and buffer storage. At runtime, it first reads the configuration file to determine what system resources are allocated to it, and then distributes the job flow across these resources. At runtime, the configuration file is specified through the environment variable $APT_CONFIG_FILE.

The Enterprise Edition runs on systems that meet the following requirements:
200 MB of free disk space for product installation
256 MB or more of memory per processing node, depending on the application
At least 500 MB of scratch disk space per processing node

Within a configuration file, the number of processing nodes defines the
degree of parallelism and resources that a particular job will use to run. It is up to the UNIX operating system to actually schedule and run the processes that make up a DataStage job across physical processors. A configuration file with a larger number of nodes generates a larger number of processes that use more memory (and perhaps more disk activity) than a configuration file with a smaller number of nodes.
While the DataStage documentation suggests creating half the number of nodes as physical CPUs, this is a conservative starting point that is highly dependent on system configuration, resource availability, job design, and other applications sharing the server hardware. For example, if a job is highly I/O dependent or dependent on external (eg. database) sources or targets, it may appropriate to have more nodes than physical CPUs.

For typical production environments, a good starting point is to set the number of nodes equal to the number of CPUs. For development environments, which are typically smaller and more resource-constrained, create smaller configuration files (eg. 2-4 nodes).

Posted: **Thu Aug 03, 2006 11:18 pm**

Thank you Avishekd for your detailed response.

Posted: **Fri Aug 04, 2006 2:09 am**

In the 4-node example above, the order of the disks is purposely shifted for each node, in an attempt to minimize I/O contention

Hi Avishek,
Do you mean that, DS prefers the intensive I/O operation in first disk specified in the list?

DSXchange

How can you divide memory among partitions?

How can you divide memory among partitions?

Configuration File design