DS Job in not completing.
Moderators: chulett, rschirm, roy
DS Job in not completing.
DS Job Design is SEQfile -> MdfyStg -> TrnasfrStg -> FunlStg -> SQLEnterpriseStg.
Functionality:- The job reads 1 file at a time and loads into target table.
I am maintaining a different table called AuditTable , which is handled by BeforeJob & AfterJob SubRoutines ( Collects, file count, file name, no of rows loaded , tgt table, any warning message etc ).
This job is 24/7 ( handled by a Sequence job.) The job is running fine from past 5 months. But recently it is getting hanged in middle at below point...
Environment variable settings: (..)
Parallel job initiated
main_program: IBM InfoSphere DataStage Enterprise Edition 8.7.0.6118 (..)
main_program: conductor uname: -s=Windows_NT; -r=1; -v=6; -n=HODSENGD01; -m=Pentium
main_program: orchgeneral: loaded (orchsort: loaded)
sqlTgtAMBSwrt: Process meta data not available in database
main_program: APT configuration file: C:/IBM/InformationServer/Server/Configurations/default.apt
(not processing further )
Please help.
Functionality:- The job reads 1 file at a time and loads into target table.
I am maintaining a different table called AuditTable , which is handled by BeforeJob & AfterJob SubRoutines ( Collects, file count, file name, no of rows loaded , tgt table, any warning message etc ).
This job is 24/7 ( handled by a Sequence job.) The job is running fine from past 5 months. But recently it is getting hanged in middle at below point...
Environment variable settings: (..)
Parallel job initiated
main_program: IBM InfoSphere DataStage Enterprise Edition 8.7.0.6118 (..)
main_program: conductor uname: -s=Windows_NT; -r=1; -v=6; -n=HODSENGD01; -m=Pentium
main_program: orchgeneral: loaded (orchsort: loaded)
sqlTgtAMBSwrt: Process meta data not available in database
main_program: APT configuration file: C:/IBM/InformationServer/Server/Configurations/default.apt
(not processing further )
Please help.
Re: DS Job in not completing.
I had this issue good number of times, but not able to find the cause. This will happen to some specific job(s), you need to raise a PMR is the best way to resolve this issue.
The problem is, your day to day prod run will not complete successfully due to this reason. Better to raise PMR.
The problem is, your day to day prod run will not complete successfully due to this reason. Better to raise PMR.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Thanks ArndW & SURA...
ArndW:
I am using the DSLogInfo() -- "Record inserted successfully into Job Audit Table"; In fact this statemtent should pop out to DS log if record inserted successfully. If no return code from then go to ErrorExit section in routine.
To check more log, I used $APT_STARTUP_STATUS = True;
Here is the full log...
Starting Job djpXTMR_AMBSLoad.
djpXTMR_AMBSLoad..BeforeJob (JOBNAME): The job name is :djpXTMR_AMBSLoad
djpXTMR_AMBSLoad..BeforeJob (USERNAME): The user id running this job is :hsbc5423
djpXTMR_AMBSLoad..BeforeJob (BeforeJobStats): " At this point getting last max(time) of the job in Audit Table "
djpXTMR_AMBSLoad..BeforeJob (GetLatestTime): The return value of query is :
--------------------------------------
2013-01-02 10:32:49.1000000
(1 rows affected)
djpXTMR_AMBSLoad..BeforeJob (GetLastStatus): Got the status of last job run successfully from AUDIT_CREDIT_SYSTEMS table
djpXTMR_AMBSLoad..BeforeJob (BeforeJobStats): " At this point insert statement into Audit table "
djpXTMR_AMBSLoad..BeforeJob (BeforeJobStats): Record inserted successfully into Job Audit Table
Environment variable settings: (..)
Parallel job initiated
main_program: IBM InfoSphere DataStage Enterprise Edition 8.7.0.6118 (..)
main_program: orchgeneral: loaded(orchsort: loaded, orchstats: loaded)
sqlTgtXTMR_AMBSwrt: Process meta data not available in database
main_program: APT configuration file: D:/IBM/InformationServer/Server/Configurations/default.apt
main_program: Max inbound connections per node = 12, Max outbound connections per node = 12
main_program: Defining Section Leaders.
main_program: Contacting Section Leaders.
main_program: Broadcasting score.
main_program: Score (116,767 bytes) sent.
main_program: Starting Players.
main_program: Waiting for Players to start.
main_program: Setting up data connections among Players.
main_program: Starting step execution.
main_program: Waiting for step completion.
(No further processing from this point...)
ArndW:
I am using the DSLogInfo() -- "Record inserted successfully into Job Audit Table"; In fact this statemtent should pop out to DS log if record inserted successfully. If no return code from then go to ErrorExit section in routine.
To check more log, I used $APT_STARTUP_STATUS = True;
Here is the full log...
Starting Job djpXTMR_AMBSLoad.
djpXTMR_AMBSLoad..BeforeJob (JOBNAME): The job name is :djpXTMR_AMBSLoad
djpXTMR_AMBSLoad..BeforeJob (USERNAME): The user id running this job is :hsbc5423
djpXTMR_AMBSLoad..BeforeJob (BeforeJobStats): " At this point getting last max(time) of the job in Audit Table "
djpXTMR_AMBSLoad..BeforeJob (GetLatestTime): The return value of query is :
--------------------------------------
2013-01-02 10:32:49.1000000
(1 rows affected)
djpXTMR_AMBSLoad..BeforeJob (GetLastStatus): Got the status of last job run successfully from AUDIT_CREDIT_SYSTEMS table
djpXTMR_AMBSLoad..BeforeJob (BeforeJobStats): " At this point insert statement into Audit table "
djpXTMR_AMBSLoad..BeforeJob (BeforeJobStats): Record inserted successfully into Job Audit Table
Environment variable settings: (..)
Parallel job initiated
main_program: IBM InfoSphere DataStage Enterprise Edition 8.7.0.6118 (..)
main_program: orchgeneral: loaded(orchsort: loaded, orchstats: loaded)
sqlTgtXTMR_AMBSwrt: Process meta data not available in database
main_program: APT configuration file: D:/IBM/InformationServer/Server/Configurations/default.apt
main_program: Max inbound connections per node = 12, Max outbound connections per node = 12
main_program: Defining Section Leaders.
main_program: Contacting Section Leaders.
main_program: Broadcasting score.
main_program: Score (116,767 bytes) sent.
main_program: Starting Players.
main_program: Waiting for Players to start.
main_program: Setting up data connections among Players.
main_program: Starting step execution.
main_program: Waiting for step completion.
(No further processing from this point...)
Odd indeed. I can't check now, but there are a number of additional logging parameters you could set including APT_DUMP_SCORE, and then variables something like 'report player calls' and 'report process ids', etc. Perhaps that might help you narrow down exactly which call from which process is the last to be executed.
On a similar note, do you have "osh.exe" processes on the datastage server when this happens?
On a similar note, do you have "osh.exe" processes on the datastage server when this happens?
Added the APT variables;
main_program: This step has 6 datasets:
ds0: {op0[1p] (sequential fsqSrcAMBSsel)
eAny<>eCollectAny
op1[2p] (parallel APT_CombinedOperatorController:{natural=mdf, synthetic="modify(0)"})}
ds1: {op1[2p] (parallel APT_CombinedOperatorController:APT_TransformOperatorImplV47S7_djpXTMR_AMBSLoad_tfpAMBS in tfpAMBS)
eAny=>eCollectAny
op2[2p] (parallel buffer(1))}
ds2: {op1[2p] (parallel APT_CombinedOperatorController:APT_TransformOperatorImplV47S7_djpXTMR_AMBSLoad_tfpAMBS in tfpAMBS)
eAny=>eCollectAny
op3[2p] (parallel buffer(0))}
ds3: {op2[2p] (parallel buffer(1))
eSame=>eCollectAny
op4[2p] (parallel fnlAMBS)}
ds4: {op3[2p] (parallel buffer(0))
eSame=>eCollectAny
op4[2p] (parallel fnlAMBS)}
ds5: {op4[2p] (parallel fnlAMBS)
eAny=>eCollectAny
op5[2p] (parallel APT_ODBCUpsertOperator in odbTgtXTMR_AMBSwrt)}
It has 6 operators:
op0[1p] {(sequential fsqSrcAMBSsel)
on nodes (
node1[op0,p0]
)}
op1[2p] {(parallel APT_CombinedOperatorController:
({natural=mdf, synthetic="modify(0)"})
(APT_TransformOperatorImplV47S7_djpXTMR_AMBSLoad_tfpAMBS in tfpAMBS)
) on nodes (
node1[op1,p0]
node2[op1,p1]
)}
op2[2p] {(parallel buffer(1))
on nodes (
node1[op2,p0]
node2[op2,p1]
)}
op3[2p] {(parallel buffer(0))
on nodes (
node1[op3,p0]
node2[op3,p1]
)}
op4[2p] {(parallel fnlAMBS)
on nodes (
node1[op4,p0]
node2[op4,p1]
)}
op5[2p] {(parallel APT_ODBCUpsertOperator in odbTgtXTMR_AMBSwrt)
on nodes (
node1[op5,p0]
node2[op5,p1]
)}
It runs 11 processes on 2 nodes.
main_program: APT_PMsectionLeader: new Section Leader running, PID = 39636, note that this PID may be transient, refer to TFPM-00201 for the final Section Leader PID.
APT_PMsectionLeader: new Section Leader running, PID = 44888, note that this PID may be transient, refer to TFPM-00201 for the final Section Leader PID.
fnlAMBS,1: APT_PMPlayer: new Player running, PID = 38704, spawned from Section Leader, PID = 44888
odbTgtXTMR_AMBSwrt,1: APT_PMPlayer: new Player running, PID = 51280, spawned from Section Leader, PID = 44888
buffer(0),1: APT_PMPlayer: new Player running, PID = 47472, spawned from Section Leader, PID = 44888
buffer(1),1: APT_PMPlayer: new Player running, PID = 55416, spawned from Section Leader, PID = 44888
APT_CombinedOperatorController,1: APT_PMPlayer: new Player running, PID = 32700, spawned from Section Leader, PID = 44888
fsqSrcAMBSsel,0: APT_PMPlayer: new Player running, PID = 10988, spawned from Section Leader, PID = 39636
fnlAMBS,0: APT_PMPlayer: new Player running, PID = 23728, spawned from Section Leader, PID = 39636
odbTgtXTMR_AMBSwrt,0: APT_PMPlayer: new Player running, PID = 53752, spawned from Section Leader, PID = 39636
buffer(0),0: APT_PMPlayer: new Player running, PID = 56252, spawned from Section Leader, PID = 39636
buffer(1),0: APT_PMPlayer: new Player running, PID = 18332, spawned from Section Leader, PID = 39636
(No further processing from this point...)
main_program: This step has 6 datasets:
ds0: {op0[1p] (sequential fsqSrcAMBSsel)
eAny<>eCollectAny
op1[2p] (parallel APT_CombinedOperatorController:{natural=mdf, synthetic="modify(0)"})}
ds1: {op1[2p] (parallel APT_CombinedOperatorController:APT_TransformOperatorImplV47S7_djpXTMR_AMBSLoad_tfpAMBS in tfpAMBS)
eAny=>eCollectAny
op2[2p] (parallel buffer(1))}
ds2: {op1[2p] (parallel APT_CombinedOperatorController:APT_TransformOperatorImplV47S7_djpXTMR_AMBSLoad_tfpAMBS in tfpAMBS)
eAny=>eCollectAny
op3[2p] (parallel buffer(0))}
ds3: {op2[2p] (parallel buffer(1))
eSame=>eCollectAny
op4[2p] (parallel fnlAMBS)}
ds4: {op3[2p] (parallel buffer(0))
eSame=>eCollectAny
op4[2p] (parallel fnlAMBS)}
ds5: {op4[2p] (parallel fnlAMBS)
eAny=>eCollectAny
op5[2p] (parallel APT_ODBCUpsertOperator in odbTgtXTMR_AMBSwrt)}
It has 6 operators:
op0[1p] {(sequential fsqSrcAMBSsel)
on nodes (
node1[op0,p0]
)}
op1[2p] {(parallel APT_CombinedOperatorController:
({natural=mdf, synthetic="modify(0)"})
(APT_TransformOperatorImplV47S7_djpXTMR_AMBSLoad_tfpAMBS in tfpAMBS)
) on nodes (
node1[op1,p0]
node2[op1,p1]
)}
op2[2p] {(parallel buffer(1))
on nodes (
node1[op2,p0]
node2[op2,p1]
)}
op3[2p] {(parallel buffer(0))
on nodes (
node1[op3,p0]
node2[op3,p1]
)}
op4[2p] {(parallel fnlAMBS)
on nodes (
node1[op4,p0]
node2[op4,p1]
)}
op5[2p] {(parallel APT_ODBCUpsertOperator in odbTgtXTMR_AMBSwrt)
on nodes (
node1[op5,p0]
node2[op5,p1]
)}
It runs 11 processes on 2 nodes.
main_program: APT_PMsectionLeader: new Section Leader running, PID = 39636, note that this PID may be transient, refer to TFPM-00201 for the final Section Leader PID.
APT_PMsectionLeader: new Section Leader running, PID = 44888, note that this PID may be transient, refer to TFPM-00201 for the final Section Leader PID.
fnlAMBS,1: APT_PMPlayer: new Player running, PID = 38704, spawned from Section Leader, PID = 44888
odbTgtXTMR_AMBSwrt,1: APT_PMPlayer: new Player running, PID = 51280, spawned from Section Leader, PID = 44888
buffer(0),1: APT_PMPlayer: new Player running, PID = 47472, spawned from Section Leader, PID = 44888
buffer(1),1: APT_PMPlayer: new Player running, PID = 55416, spawned from Section Leader, PID = 44888
APT_CombinedOperatorController,1: APT_PMPlayer: new Player running, PID = 32700, spawned from Section Leader, PID = 44888
fsqSrcAMBSsel,0: APT_PMPlayer: new Player running, PID = 10988, spawned from Section Leader, PID = 39636
fnlAMBS,0: APT_PMPlayer: new Player running, PID = 23728, spawned from Section Leader, PID = 39636
odbTgtXTMR_AMBSwrt,0: APT_PMPlayer: new Player running, PID = 53752, spawned from Section Leader, PID = 39636
buffer(0),0: APT_PMPlayer: new Player running, PID = 56252, spawned from Section Leader, PID = 39636
buffer(1),0: APT_PMPlayer: new Player running, PID = 18332, spawned from Section Leader, PID = 39636
(No further processing from this point...)
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
I added it and tested.. but no luck...
it got again stucked ... it is waiting for some thing.. which I am not able to figure out.....
If I removed the BeforeJob/AfterJob Sub-Routines, then it is working. If you say, that there is problem with BeforeJob Routine,then, it was clear from logs.. that control returned from routine, and the job got initiated.. well.. but stucked....
![Sad :(](./images/smilies/icon_sad.gif)
If I removed the BeforeJob/AfterJob Sub-Routines, then it is working. If you say, that there is problem with BeforeJob Routine,then, it was clear from logs.. that control returned from routine, and the job got initiated.. well.. but stucked....
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
Thanks Zulfi...
If you say that there might be problem with target db, i tested the job even without before/after routines, the job successfully loaded the records into the target table.
So do you think it more concerns over target table ?
More over, I used the SHOW_PIDS.. and all processes were running on both nodes... if I see the log, for each of the stage, (stagename,0):APT_PMPlayer and (stagename,1):APT_PMPlayer, pids.. were running.. Is this correct way to check if all processes were running or is there any other way to check ??
If you say that there might be problem with target db, i tested the job even without before/after routines, the job successfully loaded the records into the target table.
So do you think it more concerns over target table ?
More over, I used the SHOW_PIDS.. and all processes were running on both nodes... if I see the log, for each of the stage, (stagename,0):APT_PMPlayer and (stagename,1):APT_PMPlayer, pids.. were running.. Is this correct way to check if all processes were running or is there any other way to check ??
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
The log shows that pid's are created and then the log doesn't get updated so you can take the player pid's and go to the node where they are running and run ps command to see if all those pid's keep running or did the player pid's finish after execution and then check if the section leader finish too.
Its basically a trial and error approach to narrow down to the root cause as to what is causing the issue and even if its found the solution of which might end with PMR
Its basically a trial and error approach to narrow down to the root cause as to what is causing the issue and even if its found the solution of which might end with PMR
- Zulfi