How DS processes are managed?

rafik2k · Post by **rafik2k** » Mon Jul 09, 2007 2:51 pm

Hi All,

I have one general question on DS, Apologies if it seems silly question?
Basically I want know

1) which file in ds server controls the process like allocation,deallocation of resources to stages/jobs etc ?

2) how the processes are managed ?

3) How resource allocation is done at OS level and DS level?

Any link to previous post or info on same would help me.

Thanks in advance

Regards,
Rafik

ArndW · Post by **ArndW** » Mon Jul 09, 2007 3:24 pm

DataStage Server and PX allocate resources in different ways. You posted with a product of Server so we'll stick with that side of things. The configuration of the Server environment is controlled principally through the settings in the UVCONFIG file and in most cases the default settings are also the most efficient ones.
DataStage jobs are run as normal OS processes fork'd from a parent. They allocate process memory through standard OS calls such as malloc() and will use shared memory for some data (i.e. when system file sharing is turned on) but otherwise they will allocate as much memory as needed and consume as much CPU as they can get.
The amount of resources a given job and its processes use is determined in large part by the settings you give the job in your design. If you load a hashed file to memory it will consume more memory than if you had not. If you set your I/O to buffer reads or writes it will also use more memory (and hopefully execute much faster because of it) than with no buffering.
In Server jobs the engine will take care of process resources for you and the tunables in the engine affect mainly the performance of hashed files and engine and to a lesser extent the overall performance the jobs.
PX is a different question altogether, each stage has so many tuning options that it becomes quite complex to try to manage directly, the system defaults are almost always more efficient. There are also a score of environment settings that can be changed at runtime to change me.mory usage, buffering, communication and to modify activity

ray.wurlod · Post by **ray.wurlod** » Mon Jul 09, 2007 5:24 pm

An answer from a slightly different perspective.

(1) Resource allocation in server jobs and job sequences is "on demand". The "file" in which the demands are recorded is not, indeed, a file but rather a database table in the Repository called RT_CONFIGnnn where nnn is the job number. Records in this table are created by compiling the job successfully. Resource allocation in parallel jobs is again "on demand". The "file" in which the demands are recorded is again not a real file, but a virtual file called the "score" which is composed by the conductor process and distributed to the section leader process on each processing node. Each operator has its own built-in "on demand" strategy for demanding resources, which is influenced by stage property settings.

(2) In server jobs a background process ("phantom") executes uvsh to run a DataStage BASIC program called DSD.RUN which may, in turn, fork child processes to execute subroutines such as DSD.StageRun to execute the code from a Transformer stage. Job sequences also execute DSD.RUN since they are just special cases of server jobs. In parallel jobs the conductor process executes osh (Orchestrate shell) and retains control - and is the only process to log anything from the job. Section leader processes are started on each processing node in the currently selected configuration file, they also run osh. They will fork player processes to execute the operators mentioned in the score.

(3) These are all independent operating system processes, and are allocated resources based on their demand levels by the operating system. The only allocation of resources within DataStage is optimizations that may occur when the score is being composed; for example compatible operators may be combined into a single process.

rafik2k · Post by **rafik2k** » Tue Jul 10, 2007 8:03 am

Thanks Ray and ArndW for sharing this!

Thanks and Regards,
Rafik