Sequencer Job is aborted in DS7.5.1
Moderators: chulett, rschirm, roy
Sequencer Job is aborted in DS7.5.1
Hi All,
I am mading Sequencer jobs for initial loading for DW.
But I have a problem with Sequencer job.
Actually, I have entry point sequencer job.
the sequencer job has child-sequencer jobs.
each child sequencer job has server jobs or parallel jobs.
when I force to run entry-point sequencer job.
but this job has aborted after first-stop child-sequencer job.
the job can not let next-step sequencer jobs start.
that time, each child-sequencer job wrote below message.
BatchIDi20..JobControl (@SDIEWFA02301): Controller problem: Error calling DSRunJob(SDIEWFA02301), code=-14
[Timed out while waiting for an event]
but each child-sequencer job wirte uppoer message just after few second from being started by parent sequencer job.
I already modified mfile and t30file paramter (increasing) in uvconfig.
and applied this by restarting ds daemon.
so I have no idea for resolving this symptom.
I need your hand.
Thank in advance.
I am mading Sequencer jobs for initial loading for DW.
But I have a problem with Sequencer job.
Actually, I have entry point sequencer job.
the sequencer job has child-sequencer jobs.
each child sequencer job has server jobs or parallel jobs.
when I force to run entry-point sequencer job.
but this job has aborted after first-stop child-sequencer job.
the job can not let next-step sequencer jobs start.
that time, each child-sequencer job wrote below message.
BatchIDi20..JobControl (@SDIEWFA02301): Controller problem: Error calling DSRunJob(SDIEWFA02301), code=-14
[Timed out while waiting for an event]
but each child-sequencer job wirte uppoer message just after few second from being started by parent sequencer job.
I already modified mfile and t30file paramter (increasing) in uvconfig.
and applied this by restarting ds daemon.
so I have no idea for resolving this symptom.
I need your hand.
Thank in advance.
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
No, I did not use multi instance.Sainath.Srinivasan wrote:Do you run in multiple instance?
but this entry-point sequencer job runs 30 more server jobs concurrentyly.
So I wonder datastage engine could not start some child jobs.
I used Unix (with 16CPUs, 60G Memory) for datastage process.
SO I wonder why datastage did not fork child jobs.
just DS gave a meesage Error calling DSRunJob(SDIEWFA02301), code=-14
[Timed out while waiting for an event]
What 's mean upper message.
Thank in advance.
Does the same server job timeout each time? I suspect that it does not; could you stagger your 30 concurrent calls by making some of them depend upon others finishing? Also, monitor your cpu usage while these are running, vmstat should be detailed enough.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Each job is different one.ArndW wrote:Does the same server job timeout each time? I suspect that it does not; could you stagger your 30 concurrent calls by making some of them depend upon others finishing? Also, monitor your cpu usage while these are running, vmstat should be detailed enough.
and this machine has 30 cpus so, I think this machine has enough H/W Resource. anyway, I wonder Why datastage job aborted after just starting.
Which DataStage Parameter is related to this symptom.
or How to provent this error.
I has about 2000 initial job.
I already made sequencer jobs for handle dependency between jobs.
The error message is that DataStage has waited longer than it thinks to start a job; this is most likely due to the system's resources being bottlenecked during this initial startup phase.
Please monitor your CPU usage when the job starts, if it is over 95% for periods of 10-15 seconds then this is your most likely cause. The request to change your sequence is not a final solution, but just a way to narrow down the cause - if the error goes away then you can see the relationship and work from there.
The number of CPUs might not be limiting you. It could be virtual memory space, disk I/O (on the partition with DataStage) or even your DataStage configuration (the T30FILES is not the culprite here, but did you change any other configuration parameters?).
Please monitor your CPU usage when the job starts, if it is over 95% for periods of 10-15 seconds then this is your most likely cause. The request to change your sequence is not a final solution, but just a way to narrow down the cause - if the error goes away then you can see the relationship and work from there.
The number of CPUs might not be limiting you. It could be virtual memory space, disk I/O (on the partition with DataStage) or even your DataStage configuration (the T30FILES is not the culprite here, but did you change any other configuration parameters?).
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
dhwankim,
could you explain the
could you explain the
part - I'm afraid I don't know what you mean.JdDSSJOBUpdate_T1_JC_JOB_PARAMETERS_Hf
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hello DaeHwan,
Your monitoring will show that each job uses more than 50% of a CPU if run separately. This tells you, by simple arithmetic, that 30 jobs on 16 CPUs is overloading the machine. This is why you must run fewer jobs at a time to overcome this problem.
Your monitoring will show that each job uses more than 50% of a CPU if run separately. This tells you, by simple arithmetic, that 30 jobs on 16 CPUs is overloading the machine. This is why you must run fewer jobs at a time to overcome this problem.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JdDSSJOBUpdate_T1_JC_JOB_PARAMETERS_Hf is just one of server job.
the job processes reading sequence file and transfomrating and looking up some hashed file , and writing to sequence.
It's plain job.
I recognize now my problem with utilizing H/W Resource Usage.
But I wonder how to protect job from aborting when H/W Resource Lack.
It means that , When Serve does not have enough free resource, How can I prevent jobs going to be aborted.
Does universe (DS Engine) have any parameter relate to this sympton.
and
I want to which parameter in Unix Kernel or whatever could control this symptom.
Thank in advance
the job processes reading sequence file and transfomrating and looking up some hashed file , and writing to sequence.
It's plain job.
I recognize now my problem with utilizing H/W Resource Usage.
But I wonder how to protect job from aborting when H/W Resource Lack.
It means that , When Serve does not have enough free resource, How can I prevent jobs going to be aborted.
Does universe (DS Engine) have any parameter relate to this sympton.
and
I want to which parameter in Unix Kernel or whatever could control this symptom.
Thank in advance
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Unfortunately the only detection in DataStage is the timeout when a job fails to start within a hard-coded interval. That is we can't tune the timeout. And, as you noted, the job that can not start aborts.
You probably could do something with UNIX, but there's nothing supplied "out of the box" as far as I am aware. I am thinking of a shell script that takes one or two measures of %Idle, and only proceeds if these are non-zero, indicating that the machine has spare capacity.
Of course it may also be some other resource, such as memory (set a threshold on PF/sec) or I/O capacity. These would have to be done on a per-machine basis, since every machine is different.
You probably could do something with UNIX, but there's nothing supplied "out of the box" as far as I am aware. I am thinking of a shell script that takes one or two measures of %Idle, and only proceeds if these are non-zero, indicating that the machine has spare capacity.
Of course it may also be some other resource, such as memory (set a threshold on PF/sec) or I/O capacity. These would have to be done on a per-machine basis, since every machine is different.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thank Ray for your advice & Tips.ray.wurlod wrote:Unfortunately the only detection in DataStage is the timeout when a job fails to start within a hard-coded interval. That is we can't tune the timeout. And, as you noted, the job that can not start aborts.
You probably could do something with UNIX, but there's nothing supplied "out of the box" as far as I am aware. I am thinking of a shell script that takes one or two measures of %Idle, and only proceeds if these are non-zero, indicating that the machine has spare capacity.
Of course it may also be some other resource, such as memory (set a threshold on PF/sec) or I/O capacity. These would have to be done on a per-machine basis, since every machine is different.