Timed out while waiting for an event

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Timed out while waiting for an event

Post by jasper »

Hi,

Lately we're getting a lot of errors on
wCleanupCBOAggregates..JobControl (@AGR_CBO_ORIGINATING): Controller
problem: Error calling DSRunJob(CleanupAgrCBO), code=-14
[Timed out while waiting for an event] .
Before anyone mentions this: I have did a search and found that this is because of an overload of the system. I've also found the post about ecase 70788 (a patch to set DSD.RUN from 60 seconds
to 600 seconds ) which is offcource a workaround, not a solution

However: If I look at the load of our unix server this is not at it's limits when these errors occur(checked number of processes/CPU/memory/disk space), so it seems more of a datastage overload then a server overload.


Does anyone have an idea about the deciding factor in this?
Example is there a difference between
-50 jobs with 2 sequential stages being started together
-2 jobs with 50 sequential stages being started together
- 2 jobs with 5 stages, each using 10 parallel processes.

this way we can check what the best way is to resolve this: do we mainly sequentialize (if that's a word?) the workflows to start less parallel jobs, do we split jobs into multiple smaller jobs, or do we decrease the parallelism inside the jobs?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You should activate your APT_DUMP_SCORE variable in order to see how many pids are actually started. This will depend on your APT_CONFIG node configuration as well as whether or not your database is partitioned and you use that functionality.

Increasing/decreasing the number of nodes in your configuration file will make a significant difference in number of process fired off by PX and in many cases it is more efficient to use a 1-node configuration (even on a system with many CPUs) than a 4-node or more configuration.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use some of the other reporting environment variables, to capture the process IDs of the player processes and their memory consumption. Relate these back to your UNIX system monitoring.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply