Hi All,
DataStage Parallel job continuously running in prod environment.
Same code is executing fine in QA environment....
Even executed the same job from QA and pointed to PROD Target database, its running fine...
Could you please suggest me how to know the reason for long running jobs?
DS JOB Continuously running
Moderators: chulett, rschirm, roy
What is the job doing and if you monitor it:
- Has the job processed any data?
- If so, then has it process all of the expected data and then "hangs"?
- Does it use any CPU in the "hang" state?
Other questions would depend upon the answers to these.
- Has the job processed any data?
- If so, then has it process all of the expected data and then "hangs"?
- Does it use any CPU in the "hang" state?
Other questions would depend upon the answers to these.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Since You do not provide any information as to what your job is actually doing - except from the fact that You perform some kind of write-operation on some kind of database - it is pretty hard to give any advice whatsoever.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
There are the grateful those are happy." Francis Bacon
Thanks for your reply Arnd...
- No data processed.
- CPU process is 97% idle time.
- In the log job got hanged to invoke startup script: main_program: APT Startup script: /opt/etl/InformationServer/Server/PXEngine/etc/startup.apt
After adding Env variable $APT_STARTUP_STATUS = True, below is the log: and still job is in running state
- No data processed.
- CPU process is 97% idle time.
- In the log job got hanged to invoke startup script: main_program: APT Startup script: /opt/etl/InformationServer/Server/PXEngine/etc/startup.apt
After adding Env variable $APT_STARTUP_STATUS = True, below is the log: and still job is in running state
Code: Select all
Occurred: 8:28:28 AM On date: 9/24/2012 Type: Info
Event: main_program: APT configuration file: /opt/etl/InformationServer/Server/Configurations/prod1x.apt (...)
Occurred: 8:28:28 AM On date: 9/24/2012 Type: Info
Event: main_program: Max inbound connections per node = 3 (...)
Occurred: 8:28:28 AM On date: 9/24/2012 Type: Info
Event: main_program: Defining Section Leaders.
Occurred: 8:28:28 AM On date: 9/24/2012 Type: Info
Event: main_program: Contacting Section Leaders.
Occurred: 8:28:28 AM On date: 9/24/2012 Type: Info
Event: main_program: APT Startup script: /opt/etl/InformationServer/Server/PXEngine/etc/startup.apt
Occurred: 8:28:28 AM On date: 9/24/2012 Type: Info
Event: main_program: Broadcasting score.
Occurred: 8:28:28 AM On date: 9/24/2012 Type: Info
Event: node_node1: broadcastStepIR: score load from /tmp/APTps9380286e3cfd8f on node node1 started.
Occurred: 8:28:28 AM On date: 9/24/2012 Type: Info
Event: main_program: Score (416,891 bytes) sent.
Occurred: 8:28:28 AM On date: 9/24/2012 Type: Info
Event: main_program: Starting Players.
Occurred: 8:28:28 AM On date: 9/24/2012 Type: Info
Event: main_program: Waiting for Players to start.
Occurred: 8:28:28 AM On date: 9/24/2012 Type: Info
Event: main_program: Setting up data connections among Players.
Occurred: 8:28:28 AM On date: 9/24/2012 Type: Info
Event: main_program: Starting step execution.
Oracle Source -> Transformer (1-1 mapping) -> Oracle Target (Upsert mode --> Inserting only new records) --> Peak stage (to reject duplicates/updated records from source)BI-RMA wrote:Since You do not provide any information as to what your job is actually doing - except from the fact that You perform some kind of write-operation on some kind of database - it is pretty hard to give any advice whatsoever.
Can you make a copy of the job with just the source Oracle stage going to a PEEK and attempt calling that up to see if the problem remains reproduceable?
Also, does that Oracle stage contain SQL or even before/after code?
Also, does that Oracle stage contain SQL or even before/after code?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
If you add the option "$APT_DISABLE_COMBINATION" and set to "true" in the original job does the error picture remain the same?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>