Job Compilation, Execution and Combination

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
fmou
Participant
Posts: 124
Joined: Sat May 28, 2011 9:48 pm

Job Compilation, Execution and Combination

Post by fmou »

Hi,

Just trying to understand how the Job Compilation and Execution basically works.

To my understanding, basically, Job Compilation does 2 things,

1. Generates OSH script that represents data flow and stages
2. Generates transform code for each Transformer, then compiled them into C++ and then to corresponding native operators

Is this true?

Besides, during execution several jobs can be automatically combined together, right? This automatic combination can be switched on or off at runtime via a job parameter. Now, how can this be possible? Especially when C++ code have been generates and compiled.

Please explain.

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Several jobs? No, but operators within a single job can be combined.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Your understanding of the compilation process is essentially correct. You will note that the generated OSH contains no mention of parallelism.

There's a step that occurs before execution actually starts, though after the job run request has been issued, and that step is composition of the "score" - the script that is actually executed. Essentially this is a resource allocation exercise, which takes into account the current value of APT_CONFIG_FILE (for the appropriate degree of parallelism) and the opportunities for constructing combined operators and composite operators, among other things.

Only then does the conductor process start the section leader processes and distribute the score to them to be executed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
fmou
Participant
Posts: 124
Joined: Sat May 28, 2011 9:48 pm

Post by fmou »

thanks Ray.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ray, could you touch upon the difference between combined operators and composite operators? Thanks.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A combined operator is where the conductor process identifies two or more parallel operators that are adjacent in the design, and specified to execute with the same degree of parallelism on the same nodes. It will (aggressively) cause these to be executed as a single process as APT_CombinedOperatorController.

A composite operator is where one stage generates two or more operators. For example, a Lookup stage generates LUT_CreateOp and LUT_ProcessOp operators. A Data Set stage with an input link and "overwrite" as its mode generates one (sequential) operator to delete the Data Set and a second (parallel) copy operator to populate the Data Set. Composite operators can be detected in the score by the IN keyword, for example LUT_CreateOp IN LookupStage42.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Excellent, thanks much.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply