Hi
These are some pointers- we need to design the configuarion file for dividing the data into partitions(depending on the logic) and using the concept of parallelism in a Datastage job. Also you need to designa nd allocate appropriate space for resource disk and scratch disk in your configuration file
The configuration file tells DataStage Enterprise Edition how to exploit underlying system resources (processing, temporary storage, and dataset storage). In more advanced environments, the
configuration file can also define other resources such as databases and buffer storage. At runtime, it first reads the configuration file to determine what system resources are allocated to it, and then distributes the job flow across these resources. At runtime, the configuration file is specified through the environment variable $APT_CONFIG_FILE.
The Enterprise Edition runs on systems that meet the following requirements:
200 MB of free disk space for product installation
256 MB or more of memory per processing node, depending on the application
At least 500 MB of scratch disk space per processing node
Within a configuration file, the number of processing nodes defines the
degree of parallelism and resources that a particular job will use to run. It is up to the UNIX operating system to actually schedule and run the processes that make up a DataStage job across physical processors. A configuration file with a larger number of nodes generates a larger number of processes that use more memory (and perhaps more disk activity) than a configuration file with a smaller number of nodes.
While the DataStage documentation suggests creating half the number of nodes as physical CPUs, this is a conservative starting point that is highly dependent on system configuration, resource availability, job design, and other applications sharing the server hardware. For example, if a job is highly I/O dependent or dependent on external (eg. database) sources or targets, it may appropriate to have more nodes than physical CPUs.
For typical production environments, a good starting point is to set the number of nodes equal to the number of CPUs. For development environments, which are typically smaller and more resource-constrained, create smaller configuration files (eg. 2-4 nodes).