Order of execution

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Bilwakunj
Participant
Posts: 59
Joined: Fri Sep 10, 2004 7:00 am

Order of execution

Post by Bilwakunj »

Hi,
I want to know how is the order of execution in PX. Say my PX job has processed 1 row. Now 2nd row is on the verge of beginning. Does PX first flush off all the information of the previous row before actually start the processing of new row OR sequentially when PX approaches the column derivation it modifies the data of the previous record.
In another way I want to know say I've following columns:
col A
col B
Now after 1st record execution , before 2nd one the values of A & B will be initialized (say 0 0) or eventually when it will come to processing of A, this record will overwrite the data of 1st record and so on.

Thanks in advance!!!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Almost certainly PX will be giving you "pipeline parallelism", in which a downstream stage can be processing row #1 while an upstream stage has already begun processing row #2 (even though row #1 is still somewhere in the job). You can code to prevent this, but that's defeating one of the things that make parallel jobs go fast.

For more information on pipeline and partition parallelism read Chapter 2 of the Parallel Job Developer's Guide
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

You need to sort your data in order to maintain control in this environment. Sort/partition only on the key fields, and sort only on the dependent fields.

You can control the processing of data in the stage variables with the transformer. In fact, stage variables are executed in order received, so progressive calculations, among other ideas, can be done.

Do a search for "stage variables" to get some ideas on how to handle this concept.
Bilwakunj
Participant
Posts: 59
Joined: Fri Sep 10, 2004 7:00 am

Post by Bilwakunj »

Thanks Ray.
I've a situation as described below. In fact I had posted this on the forum but here is the full version of the requirement. So I'm posting it again.

Col A - char (8)
Col B - char (2)
Col A_Date char(8)
Col D_Date char(8)
col C char(3)

Now my job demands, if Col A = Col B = Col C, mark them as related . Now from the group of related col, find the "earliest A_Date" . Now if the A_Date of the next record is in between the earliest A_Date of prev record and (D_Date+1)of previous record then the earliest A_Date is same as that of the prev record else the "earliest A_Date" for this record is the "A_Date" of that col. Again the comparison shd continue and depending on the match the earliest A_date shd be found.
I tried this using the stage variable but as the life of stage variable is limited to 1 record, I couldn't get the correct result. I tried using look up as well but I can't update (i.e.e do read and write) of the same look up file for the updation of the new earliest A_Date among the related records.
Please let me know how this can be done in PX?



ray.wurlod wrote:Almost certainly PX will be giving you "pipeline parallelism", in which a downstream stage can be processing row #1 while an upstream stage has already begun processing row #2 (even though row #1 is still somewhere in the job). You can code to prevent this, but that's defeating one of the things that make parallel jobs go fast.

For more information on pipeline and partition parallelism read Chapter 2 of the Parallel Job Developer's Guide
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Search the forum for how to use stage variables to remember values from the previous row. Once you can do this (which is very easy), the rest should follow.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply