APT_PM_PLAYER_TIMING Env Variable enabled
Moderators: chulett, rschirm, roy
APT_PM_PLAYER_TIMING Env Variable enabled
Hello all,
One of my datastage parallel job is running for long.And, it varies from day to day with the same number of input records. On a given day it runs for about 1 hr and on some other day it runs for four hours. I have enabled APT_PM_PLAYER_TIMING variable in Admin to see the elapsed times of each step.
When the director shows as below
Operator completed. status: APT_StatusOk elapsed: 3225.97 user: 208.53 sys: 37.54 (total CPU: 246.07)
Can someone please let me know what does "user:","sys:" and "total CPU:" signify?
Thanks
One of my datastage parallel job is running for long.And, it varies from day to day with the same number of input records. On a given day it runs for about 1 hr and on some other day it runs for four hours. I have enabled APT_PM_PLAYER_TIMING variable in Admin to see the elapsed times of each step.
When the director shows as below
Operator completed. status: APT_StatusOk elapsed: 3225.97 user: 208.53 sys: 37.54 (total CPU: 246.07)
Can someone please let me know what does "user:","sys:" and "total CPU:" signify?
Thanks
Those indicate CPU time for the operator/process providing the message:
user: CPU time utilized by the process
sys: CPU time used by the operating system to support the process
Total CPU: user + sys
With time varying wildly as you describe, start by looking in the following areas:
1) Are the data volumes (number of bytes, not records) consistent from run to run? Each run may process 20mm records, but is it only 1GB one time and 5GB another?
2) How heavily is the DataStage server utilized during the different job runs? Such as: Only your job is running versus 20 other jobs are running at the same time as yours
3) If you're using external sources/targets, such as databases, how heavily are they utilized during the different job runs?
4) Job and data characteristics: If the job does database updates, maybe the longer runs have more updates to the database than the shorter runs?
Regards,
user: CPU time utilized by the process
sys: CPU time used by the operating system to support the process
Total CPU: user + sys
With time varying wildly as you describe, start by looking in the following areas:
1) Are the data volumes (number of bytes, not records) consistent from run to run? Each run may process 20mm records, but is it only 1GB one time and 5GB another?
2) How heavily is the DataStage server utilized during the different job runs? Such as: Only your job is running versus 20 other jobs are running at the same time as yours
3) If you're using external sources/targets, such as databases, how heavily are they utilized during the different job runs?
4) Job and data characteristics: If the job does database updates, maybe the longer runs have more updates to the database than the shorter runs?
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
Thanks you for the response.....
Whats really interesting in the job logs is , for the longer run and shorter run there is no significant change in the below values for each step
user: (Are these value in seconds or minutes?)
sys:
Total CPU:
The only change is the elapsed time.
For your questions,
1)The data volume is also the same
2)I have to check this, which i assume is the major culprit.
3)I have already talked to DBA group, and there is no significant changes on their side for a long run and short run.
Whats really interesting in the job logs is , for the longer run and shorter run there is no significant change in the below values for each step
user: (Are these value in seconds or minutes?)
sys:
Total CPU:
The only change is the elapsed time.
For your questions,
1)The data volume is also the same
2)I have to check this, which i assume is the major culprit.
3)I have already talked to DBA group, and there is no significant changes on their side for a long run and short run.
DataStage will attempt to start the job when it is submitted to DataStage (either through Director, Designer or on the command line with dsjob). If sufficient resources are not available, either those configured within the DataStage Server (uvconfig) or system resources, a job may abort. If the job doesn't abort due to resource availability, then it's at the mercy of the amount of CPU time the O/S can give it's processes (and no, you can't change the priority of a job).
CPU usage values are in seconds. It's not surprising that given the fact the data volumes are roughly equal at there is little change in CPU usage...it indicates that the cause is likely external to the job itself.
Network and disk storage performance and contention can also come into play, especially in heavily utilized environments.
Regards,
CPU usage values are in seconds. It's not surprising that given the fact the data volumes are roughly equal at there is little change in CPU usage...it indicates that the cause is likely external to the job itself.
Network and disk storage performance and contention can also come into play, especially in heavily utilized environments.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
Thanks for the response James
(If the job doesn't abort due to resource availability, then it's at the mercy of the amount of CPU time the O/S can give it's processes)
So, from what you said there can be a case that the job still shows in a running state without aborting even if the required resources are not available, and whenever the OS assigns the resources it starts its actual processing.
(If the job doesn't abort due to resource availability, then it's at the mercy of the amount of CPU time the O/S can give it's processes)
So, from what you said there can be a case that the job still shows in a running state without aborting even if the required resources are not available, and whenever the OS assigns the resources it starts its actual processing.
Partly correct.
The job won't enter a running state until all of the operators have been started successfully. If the O/S can't allocate the resources to start a process (operator), the job will abort. Once all of the operator processes have started, the operators initialize themselves, which can require additional resources (memory for buffers, communications links, files, etc.) and can cause job failure if the required resources are not available.
Assuming the job starts and initializes successfully, then it has to share resources (CPU, system memory, storage, etc.) with other jobs and processes running on the server. The more active processes on the server, the less time per second a process receives to execute/transfer data/etc., and the longer it takes to complete the work requested.
Regards,
The job won't enter a running state until all of the operators have been started successfully. If the O/S can't allocate the resources to start a process (operator), the job will abort. Once all of the operator processes have started, the operators initialize themselves, which can require additional resources (memory for buffers, communications links, files, etc.) and can cause job failure if the required resources are not available.
Assuming the job starts and initializes successfully, then it has to share resources (CPU, system memory, storage, etc.) with other jobs and processes running on the server. The more active processes on the server, the less time per second a process receives to execute/transfer data/etc., and the longer it takes to complete the work requested.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
Thank you for the explanation, that was so helpful.
The job that is taking longer has the below structure
Input from a oracle table retrieving data(About 2 million records for shorter run and also longer run)
lookup against a table with incoming 2 million records (Reference data for lookup, same for both the runs)
updating a table with the lookup data About 2 million records for shorter run and also longer run)
So, can you please let me know if monitoring RAM and scratch disk is a good idea.
The job that is taking longer has the below structure
Input from a oracle table retrieving data(About 2 million records for shorter run and also longer run)
lookup against a table with incoming 2 million records (Reference data for lookup, same for both the runs)
updating a table with the lookup data About 2 million records for shorter run and also longer run)
So, can you please let me know if monitoring RAM and scratch disk is a good idea.