Parallel Startup Failed Error, No Child Processes.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nelab28
Premium Member
Premium Member
Posts: 28
Joined: Fri Sep 24, 2004 1:25 am

Parallel Startup Failed Error, No Child Processes.

Post by nelab28 »

Hi,

I have a set of around 74 K records in a dataset, which i have to insert into a new table. When i try to run a parallel job, that has a Dataset(source), Transformer and a target Oracle table, I get the following errors from the director:

main_program: APT_PMConnectionRecord::start: waitpid(18708, 0, 0) returned -1, No child processes

main_program: **** Parallel startup failed ****
This is usually due to a configuration error, such as
not having the Orchestrate install directory properly
mounted on all nodes, rsh permissions not correctly
set (via /etc/hosts.equiv or .rhosts), or running from
a directory that is not mounted on all nodes. Look for
error messages in the preceding output.

main_program: A startup script is not being used.

main_program: Unable to contact one or more Section Leaders.
Probable configuration problem; contact Orchestrate system administrator.


For the job, i have four nodes defined in the configuration file.
Interestingly, i am able to create a table and load 5 million records using the same configuration file.

Any Idea what could be the reason for such a problem to occur?
:idea:
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
if you'll supply full configuration of your system,
it might help people to give an answer.
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's rare, but this one can also occur if the process table on one machine has become full, so that no processes can be started.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nelab28
Premium Member
Premium Member
Posts: 28
Joined: Fri Sep 24, 2004 1:25 am

Post by nelab28 »

Hi Ray,

Can you please shed some more light on this.

Actually,

I am able to load the same dataset into a sequential file.

When i try to load the table using either sequential file or dataset, i hit this error.

The Parallel startup is not happening while loading into the table only.

Thanks in anticipation :)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The process table is a low level component in UNIX (think of the pid as a row number in this table). There is one entry associated with each process; it's used to keep track of timeslice, execution priority, and so on.

The fact that you can execute when not writing to a table seems to exonerate the process table as a candidate cause. Hence my original opening "it's rare".

Have you checked all the things mentioned in the error message? This seems to suggest an incomplete installation or incomplete configuration of the underlying "Orchestrate" engine, or insufficient remote execution access/privileges.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ailuro
Participant
Posts: 21
Joined: Wed Sep 10, 2003 11:09 pm
Location: GMT+8

Post by ailuro »

Have you specified a value for the property
Oracle Enterprise > Input > Properties > Connection > Remote Server?
คาร์โล ตัน
nelab28
Premium Member
Premium Member
Posts: 28
Joined: Fri Sep 24, 2004 1:25 am

Post by nelab28 »

Thanks,
Not specifying the remote server has been the error.
It was done in the prior job, which loaded the table, but not in this job..Hence the error.
But, the error message a bit confusing in this regard..!!
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Please paste your $APT_CONFIG_FILE configuration file here.

You can see them via Manager (Tools | Configurations...).

This will help guide our questions and answers for you.
nelab28
Premium Member
Premium Member
Posts: 28
Joined: Fri Sep 24, 2004 1:25 am

Post by nelab28 »

Hi T42,

Here is my configuration file.
The problem has been fixed, with the error being "not specifying the server name in the Oracle Enterprise Stage".


{
node "node0"
{
fastname "DBPU2"
pools ""
resource disk "/opt/tempdataset/" {pools ""}
resource scratchdisk "/opt/tempdataset/scratch/" {pools ""}
}

node "node1"
{
fastname "DBPU2"
pools ""
resource disk "/z03/tempdataset/" {pools ""}
resource scratchdisk "/z03/tempdataset/scratch/" {pools ""}
}

node "node2"
{
fastname "DBPU2"
pools ""
resource disk "/tmp/tempdataset/" {pools ""}
resource scratchdisk "/tmp/tempdataset/scratch/" {pools ""}
}

node "node3"
{
fastname "DBPU2"
pools ""
resource disk "/opt/tempdataset/" {pools ""}
resource scratchdisk "/opt/tempdataset/scratch/" {pools ""}
}


}
dsxuserrio
Participant
Posts: 82
Joined: Thu Dec 02, 2004 10:27 pm
Location: INDIA

Post by dsxuserrio »

Nelab
How was this dataset created?? Was it created using the same configuration file?? This kind of error frequently happens when the config file or datastet are ported from dev to test.
Thanks
dsxuserrio
dsxuserrio

Kannan.N
Bangalore,INDIA
nelab28
Premium Member
Premium Member
Posts: 28
Joined: Fri Sep 24, 2004 1:25 am

Post by nelab28 »

Yes, The dataset was created using the same configuration file. There was no porting done, across projects or systems.

Well, while porting the projects, from dev to test/ prdn., checking on the configuration file, and to maintain consistencies will avoid any such errors.

But my problem was that, it was not reflecting the error message that "the db server" was not mentioned. Rather the log indicates relook at the configuration/installation.
dsxuserrio
Participant
Posts: 82
Joined: Thu Dec 02, 2004 10:27 pm
Location: INDIA

Post by dsxuserrio »

Nelab
What is z03 (node1 ) ?? Do you have a folder by that name or is it a typo.
Have you resolved your problem??
nelab28 wrote:
node "node1"
{
fastname "DBPU2"
pools ""
resource disk "/z03/tempdataset/" {pools ""}
resource scratchdisk "/z03/tempdataset/scratch/" {pools ""}
}
dsxuserrio

Kannan.N
Bangalore,INDIA
Post Reply