Data Stage Execution

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bks_prasad
Participant
Posts: 10
Joined: Thu Mar 18, 2004 12:08 am

Data Stage Execution

Post by bks_prasad »

Hi All,

I want to know about the sequence of steps Data Stage Performs in the Background during the execution of a Data Stage Job . Lets say I have simple Job
ODBC Stage---->Transformer Stage---->ODBC Stage

Thanks in Advance
Prasad
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A process is started executing uv (ultimately uvsh) running DSD.RUN, which is a DataStage BASIC routine that runs server jobs. This obtains job parameter values, or run-time default values for them.

Before starting anything else, the job will execute its before-job subroutine.

In your design, the job process forks a child uv process running DSD.StageRun, which is a DataStage BASIC routine that runs the Transformer stage.

Source code for DSD.RUN and DSD.StageRun are intellectual property of Ascential Software Corporation, and not in the public domain.

The source code for the subroutine that IS the Transformer stage can be viewed in the RT_BPnnn directory, where nnn is the job number.

If all the pieces are in place, a Transformer stage does the following.

Code: Select all

1. Execute a before-stage subroutine.
2. Open all its input and output links (which means that the "Open" function exposed by the passive stage on the other end of that link is executed; for example a file is opened, or a connection established to a database and SQL statement(s) prepared).
3. Initialize all stage variables, in the order in which they are declared.
4. For each row on the stream input link (updating status information for each 1000 rows processed):
  (a) get the row from the stream input link (which means that the "Get Next" function exposed by the passive stage is executed and may return a row, an error, or a "no more data" token)
  (b) for each reference input link, in the declared execution order:
        (i)  evaluate reference key expression(s)
        (ii)  issue a reference input query (which means that the "Get Row By Key" function exposed by the passive stage is executed and may return a row containing data, or a row containing all NULL columns or, in an ODBC or UV stage, multiple rows)
  (c) evaluate all stage variables, in the order in which they are declared
  (d) initialize the REJECTED variable to "true" (this detects whether a row has been processed on to an output link)
  (e) for each output link, in the declared execution order, evaluate its constraint expression (a "rejects" output link has an effective constraint of (REJECTED = @TRUE)) then, if the constraint expression is satisfied, set the REJECTED variable to "false", evaluate each of the column derivation expressions in the order in which they have been declared, then send the row to the output link (which means that the "Put" function exposed by the passive stage is executed, and may return a success code or an error code)
4.  Close all output and input links.
5.  Execute after-stage subroutine.
When the DSD.StageRun process closes, it notifies its parent. An entry containing the pid and the token "[Done]" appears in the parent process's output file in the &PH& directory.

Any abort messages are captured into the stage's output file in the &PH& directory. If the job has not aborted immediately, these messages are copied from there to the job log. Otherwise, this can be done when the job is reset.

The parent job periodically checks that the child process is still present, by sending it a null signal (kill -0) to the child process.

Once all its active stages have closed, the job process executes its after-job subroutine before exiting.

Both the job process and the stage process periodically update respective entries in the RT_STATUSnnn table in the repository. It is this table that is viewed in the Status and Monitor views in Director, and if performance statistics are being shown on the Designer canvas.

As necessary, entries are written to the RT_LOGnn table in the Repository. This is the table viewed in the Log view in Director.

More complex job designs obtain dependency and other information from the RT_CONFIGnnn table in the Repository.

For the ODBC stage:
  • the "Open" function is BCI.Open, which - among other things - issues SQLConnect ,SQLPrepare and SQLExecute calls
    the "Get Next" function is BCI.GetNext, which - among other things, issues SQLFetch calls
    the "Get Row By Key" function is BCI.GetByKey, which also issues SQLFetch calls
    the "Put" function is BCI.Put, which issues SQLExecute calls
    the "Close" function is BCI.Close, which - among other things, issues a SQLDisconnect call
Why did you want to know?
How do you plan to use your knowledge?
You can't change the way it works. :twisted:

If you enable stage tracing for the Transformer stage, and enable all the choices, you can see all this happening. I omitted parameter substitution, determination of property values and other substitutions, for clarity.

If you want a detailed reply, you might be better off posting on ADN. :lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:If you want a detailed reply, you might be better off posting on ADN.
And this was...? :roll: Why go there when you can get the Full Wurlod right here? :lol:
-craig

"You can never have too many knives" -- Logan Nine Fingers
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

Post by datastage »

chulett wrote:
ray.wurlod wrote:If you want a detailed reply, you might be better off posting on ADN.
And this was...? :roll: Why go there when you can get the Full Wurlod right here? :lol:
In the tradition of having a 'Hello World' tutorial program, can we have a 'Hello Wurlod" program?
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.

"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

My name is of Swiss origin, the family being located mainly in and around Lausanne. Visit Wurlod Architects to see some innovative architecture.
As you can see from my profile, I'm part of the Australian branch of the family, begun when three Swiss ancestors were shipwrecked on the southern shore of Australia. (They made money during the Victorian gold rush in the 1850's by opening a pub.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
denzilsyb
Participant
Posts: 186
Joined: Mon Sep 22, 2003 7:38 am
Location: South Africa
Contact:

Post by denzilsyb »

and here i was thinking that ozzies were convicts from england.

south africa vs australia this weekend ray...
dnzl
"what the thinker thinks, the prover proves" - Robert Anton Wilson
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

George Gregan's 100th test: they'll be wanting to make it a good one for him, especially since he's returning from a shoulder injury. :D
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
denzilsyb
Participant
Posts: 186
Joined: Mon Sep 22, 2003 7:38 am
Location: South Africa
Contact:

Post by denzilsyb »

ha! good luck - its going to be a hum dinger! :D
dnzl
"what the thinker thinks, the prover proves" - Robert Anton Wilson
Post Reply