very limited parallelism

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

very limited parallelism

Post by jasper »

Hi,
we've just setup a new server with DS 8.0.1 . first tests seem succesfull. We've had some strange issues with jobs failing , which would run succesfully after a simple restart. We are not sure what the issue is but we were able to bring it back to a very simple testcase:
One job with a rowgenerator generating 1 number field(10 records). this goes to a transformer, that simply does number+ 1 and then goes to a peek.

We tried this on a server with nothing else running, only one director connected. It runs fine for 7 nodes, but fails for 8 nodes.(coredump is being created) If we start connecting with multiple clients the nodes need to be less for a succesfull run.

If I remove the transformer from the job and replace it by other stages it runs fine (tried out with 5 other stages at once and 16 nodes)

we checked the ulimit from DS itself which is as:
testuv..JobControl (ExecSH): Executed command: ulimit -a; id
*** Output from command was: ***
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 8192
coredump(blocks) unlimited
nofiles(descriptors) 100
memory(kbytes) unlimited
uid=9889(Pdts) gid=9889(Pdts)

(tried increasing the nofiles(descriptors) also to 65000 which didn't help.)

I have no clue where to start looking for this problem. On this forum I mostly find referals to the ulimit, or problems with much higher load. (IBM support is either in holiday mode or also clueless, cause there's also no relevant info from that side).
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What system condition is triggering the coredump? Also, do you have "dbx" installed, this will let you look at the stack of the core (among other functions) with which you might be able to track the cause.
Post Reply