How DS processes are managed?

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
rafik2k
Participant
Posts: 182
Joined: Wed Nov 23, 2005 1:36 am
Location: Sydney

How DS processes are managed?

Post by rafik2k »

Hi All,

I have one general question on DS, Apologies if it seems silly question?
Basically I want know

1) which file in ds server controls the process like allocation,deallocation of resources to stages/jobs etc ?

2) how the processes are managed ?

3) How resource allocation is done at OS level and DS level?

Any link to previous post or info on same would help me.


Thanks in advance

Regards,
Rafik
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

DataStage Server and PX allocate resources in different ways. You posted with a product of Server so we'll stick with that side of things. The configuration of the Server environment is controlled principally through the settings in the UVCONFIG file and in most cases the default settings are also the most efficient ones.
DataStage jobs are run as normal OS processes fork'd from a parent. They allocate process memory through standard OS calls such as malloc() and will use shared memory for some data (i.e. when system file sharing is turned on) but otherwise they will allocate as much memory as needed and consume as much CPU as they can get.
The amount of resources a given job and its processes use is determined in large part by the settings you give the job in your design. If you load a hashed file to memory it will consume more memory than if you had not. If you set your I/O to buffer reads or writes it will also use more memory (and hopefully execute much faster because of it) than with no buffering.
In Server jobs the engine will take care of process resources for you and the tunables in the engine affect mainly the performance of hashed files and engine and to a lesser extent the overall performance the jobs.
PX is a different question altogether, each stage has so many tuning options that it becomes quite complex to try to manage directly, the system defaults are almost always more efficient. There are also a score of environment settings that can be changed at runtime to change me.mory usage, buffering, communication and to modify activity
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

An answer from a slightly different perspective.

(1) Resource allocation in server jobs and job sequences is "on demand". The "file" in which the demands are recorded is not, indeed, a file but rather a database table in the Repository called RT_CONFIGnnn where nnn is the job number. Records in this table are created by compiling the job successfully. Resource allocation in parallel jobs is again "on demand". The "file" in which the demands are recorded is again not a real file, but a virtual file called the "score" which is composed by the conductor process and distributed to the section leader process on each processing node. Each operator has its own built-in "on demand" strategy for demanding resources, which is influenced by stage property settings.

(2) In server jobs a background process ("phantom") executes uvsh to run a DataStage BASIC program called DSD.RUN which may, in turn, fork child processes to execute subroutines such as DSD.StageRun to execute the code from a Transformer stage. Job sequences also execute DSD.RUN since they are just special cases of server jobs. In parallel jobs the conductor process executes osh (Orchestrate shell) and retains control - and is the only process to log anything from the job. Section leader processes are started on each processing node in the currently selected configuration file, they also run osh. They will fork player processes to execute the operators mentioned in the score.

(3) These are all independent operating system processes, and are allocated resources based on their demand levels by the operating system. The only allocation of resources within DataStage is optimizations that may occur when the score is being composed; for example compatible operators may be combined into a single process.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rafik2k
Participant
Posts: 182
Joined: Wed Nov 23, 2005 1:36 am
Location: Sydney

Post by rafik2k »

Thanks Ray and ArndW for sharing this!


Thanks and Regards,
Rafik
Post Reply