Page 1 of 1

Time Spent on Each stage

Posted: Mon Dec 28, 2009 7:19 am
by DS_FocusGroup
I know there is a way in which you can check in the log how much time is being spent on each stage. I just can not remember it? Is there an Env variable that needs to be enabled?

Posted: Mon Dec 28, 2009 10:06 am
by chulett
Perhaps APT_PM_PLAYER_TIMING set to True? :?

Posted: Mon Dec 28, 2009 10:19 am
by Akumar1
chulett wrote:Perhaps APT_PM_PLAYER_TIMING set to True? :? ...
yes , its correct ...
$APT_PM_PLAYER_TIMING: (set to true)this reporting option lets you see what each operator in a job is doing, especially how much data they are handling and how much CPU they are consuming

Posted: Mon Dec 28, 2009 10:39 am
by DS_FocusGroup
shouldn't the elapsed time of each match the total time of production or the production time is a sum of something else ? For Example. I join two tables using a join stage and write to a dataset. It comes up with something like this.
Stage1_0,0: Operator completed. status: APT_StatusOk elapsed: 2.66 user: 0.08 sys: 0.01 (total CPU: 0.09)
Stage2_0,0: Operator completed. status: APT_StatusOk elapsed: 8.08 user: 1.30 sys: 0.05 (total CPU: 1.35)
Join_16,0: Operator completed. status: APT_StatusOk elapsed: 8.23 user: 0.01 sys: 0.01 (total CPU: 0.02)
Data_Set_26,0: Operator completed. status: APT_StatusOk elapsed: 8.24 user: 0.00 sys: 0.00 (total CPU: 0.00)
Join_16,0: Operator completed. status: APT_StatusOk elapsed: 8.24 user: 0.05 sys: 0.00 (total CPU: 0.05)
main_program: Startup time, 0:07; production run time, 0:08.
And why is it showing info regarding the single Join two times with two seprate figures :roll: ?

Posted: Mon Dec 28, 2009 11:10 am
by chulett
Well... how many nodes did the job run on?

Posted: Mon Dec 28, 2009 3:42 pm
by ray.wurlod
Yes: the Performance Analysis tool. It provides precisely this information, as well as information about other resources (e.g. memory, disk).

Posted: Tue Dec 29, 2009 6:33 am
by DS_FocusGroup
two nodes. and my question still stands which was that the elapse time for each stage shouldn't be equal to the production run time it gives at the end?

Posted: Tue Dec 29, 2009 6:38 am
by chulett
Two nodes = two sets of stats. Since the nodes run in parallel, why would you expect all of the stats to sum up to the total run time? Constrain it to one node and see if the stats make more sense that way.

Posted: Tue Dec 29, 2009 7:07 am
by DS_FocusGroup
ok thanks :)

Posted: Tue Dec 29, 2009 3:35 pm
by ray.wurlod
Total production run time includes overheads that are not counted against any one stage (operator). Further, stages are executing simultaneously, so that the elapsed time for any one stage could potentially be the same as (or nearly the same as) the job itself.