How parallel jobs execute at runtime wrt DS Engine?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vsi
Premium Member
Premium Member
Posts: 507
Joined: Wed Mar 15, 2006 1:44 pm

How parallel jobs execute at runtime wrt DS Engine?

Post by vsi »

All Gurus,

Pls. explain in detail:
1. How parallel jobs execute at runtime???
2. The execution process with respect to DS Engine/OS level etc.???
3. What happens when we run the PX jobs internally???

Thanks in advance
:?: :?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

One process (the "conductor") reads the generated OSH and the configuration file (referred to be the APT_CONFIG_FILE environment variable) and, from these, composes the score.

The conductor process then organizes for one process (the "section leader") to be started on each processing node mentioned in the configuration file. It then distributes the score to each of these section leader processes. All this can be seen by setting the environment variables APT_STARTUP_STATUS and APT_DUMP_SCORE.

Each operator mentioned in the score becomes a process (a "player") on the processing node. Players communicate with each other, and with the section leader process. Section leader processes communicate with the conductor process, which is the only process to write entries to the job log (thus avoiding contention).

I have no understanding about the rest of your questions. All the above processes are regular operating system processes running the osh executable ($PXHOME/bin/osh); the environment variable APT_PM_SHOW_PIDS can cause the process ID of each to be logged. I definitely do not understand what you mean by "internally".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vsi
Premium Member
Premium Member
Posts: 507
Joined: Wed Mar 15, 2006 1:44 pm

Post by vsi »

ray.wurlod wrote:One process (the "conductor") reads the generated OSH and the configuration file (referred to be the APT_CONFIG_FILE environment variable) and, from these, composes the score

The conductor process then organizes for one process (the "section leader") to be started on each processing node mentioned in the configuration file. It then distributes the score to each of these section leader processes. All this can be seen by setting the environment variables APT_STARTUP_STATUS and APT_DUMP_SCORE.

Each operator mentioned in the score becomes a process (a "player") on the processing node. Players communicate with each other, and with the section leader process. Section leader processes communicate with the conductor process, which is the only process to write entries to the job log (thus avoiding contention).

I have no understanding about the rest of your questions. All the above processes are regular operating system processes running the osh executable ($PXHOME/bin/osh); the environment variable APT_PM_SHOW_PIDS can cause the process ID of each to be logged. I definitely do not understand what you mean by "internally".
Who creates the Conductor process? Is it the DSEngine? Ok What happens when we call/run a job on Unix OS using RunDsJob <JobNm> <Params...>? How the process of Conductor gets initiated? Pls. explain.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

All these Conductor, player, section leader are a name given to a process which controls others and created by OSH and hence DSEngine. Inturns all these are one process of OS. As soon as you start a job the flow mentioned by Ray starts. No matter how you start it, either from GUI or from BASIC code or from dsjob unix command.
As mentioned Conductor is one per job.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
vsi
Premium Member
Premium Member
Posts: 507
Joined: Wed Mar 15, 2006 1:44 pm

Post by vsi »

kumar_s wrote:All these Conductor, player, section leader are a name given to a process which controls others and created by OSH and hence DSEngine. Inturns all these are one process of OS. As soon as you start a job the flow mentioned by Ray starts. No matter how you start it, either from GUI or from BASIC code or from dsjob unix command.
As mentioned Conductor is one per job.
Thank you very much for clarifying my doubt.
ashik_punar
Premium Member
Premium Member
Posts: 71
Joined: Mon Nov 13, 2006 12:40 am

Post by ashik_punar »

Hi Ray/Kumar,

This is a lovely piece of information. Thanks a lot for providing insite into this topic. I am really thankful to you for this. But i still have a few questions like this. If youi find some free time then please help me by giving some inputs on these.The question is:

1. How the transformer works internally while the execution of a job.

If you can guide me about some documents from where i can know about these things in detail then please guide me on that also.

Thanks in advance,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The only documents are the manuals, including the Orchestrate manuals.

You can inspect the C++ source code generated by a Transformer stage, either in an error message or in the subdirectory called RT_SCnnn, where nnn is the job number.

Code: Select all

SELECT JOBNO FROM DS_JOBS WHERE NAME = '<<Job Name>>';
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply