Hi,
Just trying to understand how the Job Compilation and Execution basically works.
To my understanding, basically, Job Compilation does 2 things,
1. Generates OSH script that represents data flow and stages
2. Generates transform code for each Transformer, then compiled them into C++ and then to corresponding native operators
Is this true?
Besides, during execution several jobs can be automatically combined together, right? This automatic combination can be switched on or off at runtime via a job parameter. Now, how can this be possible? Especially when C++ code have been generates and compiled.
Please explain.
Thanks
Job Compilation, Execution and Combination
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Your understanding of the compilation process is essentially correct. You will note that the generated OSH contains no mention of parallelism.
There's a step that occurs before execution actually starts, though after the job run request has been issued, and that step is composition of the "score" - the script that is actually executed. Essentially this is a resource allocation exercise, which takes into account the current value of APT_CONFIG_FILE (for the appropriate degree of parallelism) and the opportunities for constructing combined operators and composite operators, among other things.
Only then does the conductor process start the section leader processes and distribute the score to them to be executed.
There's a step that occurs before execution actually starts, though after the job run request has been issued, and that step is composition of the "score" - the script that is actually executed. Essentially this is a resource allocation exercise, which takes into account the current value of APT_CONFIG_FILE (for the appropriate degree of parallelism) and the opportunities for constructing combined operators and composite operators, among other things.
Only then does the conductor process start the section leader processes and distribute the score to them to be executed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
A combined operator is where the conductor process identifies two or more parallel operators that are adjacent in the design, and specified to execute with the same degree of parallelism on the same nodes. It will (aggressively) cause these to be executed as a single process as APT_CombinedOperatorController.
A composite operator is where one stage generates two or more operators. For example, a Lookup stage generates LUT_CreateOp and LUT_ProcessOp operators. A Data Set stage with an input link and "overwrite" as its mode generates one (sequential) operator to delete the Data Set and a second (parallel) copy operator to populate the Data Set. Composite operators can be detected in the score by the IN keyword, for example LUT_CreateOp IN LookupStage42.
A composite operator is where one stage generates two or more operators. For example, a Lookup stage generates LUT_CreateOp and LUT_ProcessOp operators. A Data Set stage with an input link and "overwrite" as its mode generates one (sequential) operator to delete the Data Set and a second (parallel) copy operator to populate the Data Set. Composite operators can be detected in the score by the IN keyword, for example LUT_CreateOp IN LookupStage42.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.