APT_PM_PLAYER_TIMING Env Variable enabled

srds2 · Post by **srds2** » Thu Jun 07, 2012 7:25 am

Hello all,

One of my datastage parallel job is running for long.And, it varies from day to day with the same number of input records. On a given day it runs for about 1 hr and on some other day it runs for four hours. I have enabled APT_PM_PLAYER_TIMING variable in Admin to see the elapsed times of each step.
When the director shows as below
Operator completed. status: APT_StatusOk elapsed: 3225.97 user: 208.53 sys: 37.54 (total CPU: 246.07)
Can someone please let me know what does "user:","sys:" and "total CPU:" signify?

Thanks

jwiles · Post by **jwiles** » Thu Jun 07, 2012 8:47 am

Those indicate CPU time for the operator/process providing the message:

user: CPU time utilized by the process
sys: CPU time used by the operating system to support the process
Total CPU: user + sys

With time varying wildly as you describe, start by looking in the following areas:

1) Are the data volumes (number of bytes, not records) consistent from run to run? Each run may process 20mm records, but is it only 1GB one time and 5GB another?
2) How heavily is the DataStage server utilized during the different job runs? Such as: Only your job is running versus 20 other jobs are running at the same time as yours
3) If you're using external sources/targets, such as databases, how heavily are they utilized during the different job runs?
4) Job and data characteristics: If the job does database updates, maybe the longer runs have more updates to the database than the shorter runs?

Regards,

srds2 · Post by **srds2** » Thu Jun 07, 2012 9:41 am

Thanks you for the response.....
Whats really interesting in the job logs is , for the longer run and shorter run there is no significant change in the below values for each step

user: (Are these value in seconds or minutes?)
sys:
Total CPU:
The only change is the elapsed time.

For your questions,
1)The data volume is also the same
2)I have to check this, which i assume is the major culprit.
3)I have already talked to DBA group, and there is no significant changes on their side for a long run and short run.

srds2 · Post by **srds2** » Thu Jun 07, 2012 9:52 am

Also, if there are 20 other datastage jobs running at the same time..Will my job abort or will it be on hold internally until some of the jobs are finished and then starts running ?

jwiles · Post by **jwiles** » Thu Jun 07, 2012 11:33 am

DataStage will attempt to start the job when it is submitted to DataStage (either through Director, Designer or on the command line with dsjob). If sufficient resources are not available, either those configured within the DataStage Server (uvconfig) or system resources, a job may abort. If the job doesn't abort due to resource availability, then it's at the mercy of the amount of CPU time the O/S can give it's processes (and no, you can't change the priority of a job).

CPU usage values are in seconds. It's not surprising that given the fact the data volumes are roughly equal at there is little change in CPU usage...it indicates that the cause is likely external to the job itself.

Network and disk storage performance and contention can also come into play, especially in heavily utilized environments.

Regards,

PaulVL · Post by **PaulVL** » Thu Jun 07, 2012 11:54 am

Certain external databases may also have policy issues where at a given time of the day, priority is given to user queries rather than batch load ids.

So your load process might actually be throttled by the database settings.

srds2 · Post by **srds2** » Thu Jun 07, 2012 12:30 pm

Thanks for the response James
(If the job doesn't abort due to resource availability, then it's at the mercy of the amount of CPU time the O/S can give it's processes)
So, from what you said there can be a case that the job still shows in a running state without aborting even if the required resources are not available, and whenever the OS assigns the resources it starts its actual processing.

jwiles · Post by **jwiles** » Thu Jun 07, 2012 1:29 pm

Partly correct.

The job won't enter a running state until all of the operators have been started successfully. If the O/S can't allocate the resources to start a process (operator), the job will abort. Once all of the operator processes have started, the operators initialize themselves, which can require additional resources (memory for buffers, communications links, files, etc.) and can cause job failure if the required resources are not available.

Assuming the job starts and initializes successfully, then it has to share resources (CPU, system memory, storage, etc.) with other jobs and processes running on the server. The more active processes on the server, the less time per second a process receives to execute/transfer data/etc., and the longer it takes to complete the work requested.

Regards,

srds2 · Post by **srds2** » Thu Jun 07, 2012 3:07 pm

Thank you for the explanation, that was so helpful.

The job that is taking longer has the below structure

Input from a oracle table retrieving data(About 2 million records for shorter run and also longer run)
lookup against a table with incoming 2 million records (Reference data for lookup, same for both the runs)
updating a table with the lookup data About 2 million records for shorter run and also longer run)

So, can you please let me know if monitoring RAM and scratch disk is a good idea.

jwiles · Post by **jwiles** » Thu Jun 07, 2012 3:45 pm

Always!