DSXchange

Posted: **Tue Jan 23, 2007 5:20 am**

Are there any contraints on the number of parallel jobs that can be invoked from a sequence. We seem to get job failures when more than 20 jobs are invoked in parallel from a sequence.

Posted: **Tue Jan 23, 2007 5:46 am**

The limitation is not in the sequence itself, but might be related to the amount of resources that 20 concurrent PX jobs might use on your system. What sort of job failure causes do you get?

Posted: **Tue Jan 23, 2007 5:46 am**

The constraints are forced by resources available on your machine.
What kind of error messages are reported?

Posted: **Tue Jan 23, 2007 7:03 am**

Timeouts and the dreaded -14 error trying to start jobs are signs of a resource constrained system...

Posted: **Tue Jan 23, 2007 8:05 am**

My understanding of unix is that there is no limit to how many processes can be started at any one time as unix will then share system resource between them all. Therefore i am confused as to why DS can have trouble starting jobs.

Posted: **Tue Jan 23, 2007 8:08 am**

There is a limit. This limit and constraint can be better explained by a unix admin to you. They have to tune the kernal appropriately to support massive simultaneous processing. And if you push to much, you can reach resource limitations, beyond the kernal tuning.

Posted: **Tue Jan 23, 2007 9:07 am**

Pete Morris wrote:...no limit to how many processes can be started at any one time as unix will then share system resource between them all...

Even though a given UNIX machine might be nominally able to start 2^^8 or 2^^16 PIDs according to it's configuration doesn't mean that it can.
All systems have a memory limit. This is usually the sum of physical memory in the machine and some amount of other, usually disk, storage reserved to hold swapped out memory. Each PID uses a bit of shared memory with others but also has it's own private memory space. If we were to assume that each process uses 1Mb of memory (a very conservative number considering what they will be doing) then a machine that has 512Mb of main memory and another 512Mb of swap space could run 1000 processes (assuming the operating system itself doesn't use any of this).

So by no means is the number of processes on any system unlimited, even if these are doing nothing at all.

Taking this just one step further, each process needs to consume CPU. A simple model for the 1000 process system theorized above would have each one use 1/1000 of the available CPU. But there are also a number of UNIX or other OS processes that need their share. But since chances are that the process' memory has been pushed out to disk (since system memory is full) the OS needs CPU cycles to locate some other process that it can swap out, then copy that processes' settings to swap and then read the current one back into memory. All of this takes so long (more than 1/1000 of the available CPU) that by the time a process gets to be executed it is already time for it to be pushed out in favor of another. This escalates until the system is thrashing and effectively spending 100% of it's time maintaining itself - sort of like big government

To bring this back to your probablye problem - if the system starts so many active processes that contend for scarce resources (CPU,I/O,Memory, Database locks, etc.) at the same time it slows down dramatically; and there are hard-coded timers in DataStage as well as other applications that come into effect when the machine is that slow.

Posted: **Thu Jan 25, 2007 4:04 am**

What are the timeout mechanisms and is there a way to overide them.

Posted: **Thu Jan 25, 2007 4:05 am**

What are the timeout mechanisms and is there a way to overide them.

Posted: **Thu Jan 25, 2007 4:53 am**

This thread doesn't really point out any timeouts directly. There are some hard-coded values in DataStage that the engineers put in, thinking that no normal system could hit them during processing. As we all know, systems are often used in ways that designers and engineers don't consider and thus what seems to be a reasonable 60-second maximum wait time for a process to send back it's "I'm alive" message {which usually comes back in milliseconds} is no longer sufficient. This can be the case when starting up jobs and can trigger the -14 timeout message; IBM recently brought out a workaround (I wouldn't call this a bug, so the solution reallly isn't a "patch" but an enhancement) to avoid this timeout. But I wouldn't recommend putting in changes like this - it is much more important to try to avoid such timeouts, either by redesign or perhaps through hardware reconfiguration or upgrades, since if these limits are reached the system is probably so overloaded that response times will be abyssmal and system overhead will take up more cycles than actual processing.

The most common timeout is with IPC; and these values can be changed by us. It almost never makes any sense to change the actual buffer sizes and I've rarely seen cases where there is a valid reason to increase the default timeout defaults significantly.

Posted: **Thu Jan 25, 2007 2:59 pm**

When I was learning operating system (PRIMOS) tuning in nineteen mumble mumble I was told that the optimum point is "just before the machine starts thrashing". That, of course, is a movable target, but the method was usually ramp up the parameter in question untill thrashing began, then back off a bit. This was done under (perhaps simulated) "normal" or "heavy" load conditions.

Note how accurately this is quantified. Not.

You might ask - demand to know - how long it will take to drive from point A to point B. The answer will depend on many factors, over only some of which you have any control. And the answer may be different at different times.

Posted: **Thu Jan 25, 2007 3:03 pm**

You remember stuff from nineteen mumble mumble

Posted: **Thu Jan 25, 2007 3:26 pm**

That's one of the secrets of my success.

The Three Secrets of Success
1. see above
2. don't tell them everything you know

Posted: **Thu Jan 25, 2007 3:29 pm**

And the third one is not for all, right

DSXchange

Issues with Sequences

Issues with Sequences