Page 1 of 1

Parallel Startup Failed Error, No Child Processes.

Posted: Mon Jan 31, 2005 5:45 am
by nelab28
Hi,

I have a set of around 74 K records in a dataset, which i have to insert into a new table. When i try to run a parallel job, that has a Dataset(source), Transformer and a target Oracle table, I get the following errors from the director:

main_program: APT_PMConnectionRecord::start: waitpid(18708, 0, 0) returned -1, No child processes

main_program: **** Parallel startup failed ****
This is usually due to a configuration error, such as
not having the Orchestrate install directory properly
mounted on all nodes, rsh permissions not correctly
set (via /etc/hosts.equiv or .rhosts), or running from
a directory that is not mounted on all nodes. Look for
error messages in the preceding output.

main_program: A startup script is not being used.

main_program: Unable to contact one or more Section Leaders.
Probable configuration problem; contact Orchestrate system administrator.


For the job, i have four nodes defined in the configuration file.
Interestingly, i am able to create a table and load 5 million records using the same configuration file.

Any Idea what could be the reason for such a problem to occur?
:idea:

Posted: Mon Jan 31, 2005 6:05 am
by roy
Hi,
if you'll supply full configuration of your system,
it might help people to give an answer.

Posted: Mon Jan 31, 2005 3:23 pm
by ray.wurlod
It's rare, but this one can also occur if the process table on one machine has become full, so that no processes can be started.

Posted: Mon Jan 31, 2005 7:51 pm
by nelab28
Hi Ray,

Can you please shed some more light on this.

Actually,

I am able to load the same dataset into a sequential file.

When i try to load the table using either sequential file or dataset, i hit this error.

The Parallel startup is not happening while loading into the table only.

Thanks in anticipation :)

Posted: Mon Jan 31, 2005 11:23 pm
by ray.wurlod
The process table is a low level component in UNIX (think of the pid as a row number in this table). There is one entry associated with each process; it's used to keep track of timeslice, execution priority, and so on.

The fact that you can execute when not writing to a table seems to exonerate the process table as a candidate cause. Hence my original opening "it's rare".

Have you checked all the things mentioned in the error message? This seems to suggest an incomplete installation or incomplete configuration of the underlying "Orchestrate" engine, or insufficient remote execution access/privileges.

Posted: Tue Feb 01, 2005 12:14 am
by ailuro
Have you specified a value for the property
Oracle Enterprise > Input > Properties > Connection > Remote Server?

Posted: Tue Feb 01, 2005 12:44 am
by nelab28
Thanks,
Not specifying the remote server has been the error.
It was done in the prior job, which loaded the table, but not in this job..Hence the error.
But, the error message a bit confusing in this regard..!!

Posted: Tue Feb 01, 2005 1:55 pm
by T42
Please paste your $APT_CONFIG_FILE configuration file here.

You can see them via Manager (Tools | Configurations...).

This will help guide our questions and answers for you.

Posted: Tue Feb 01, 2005 9:37 pm
by nelab28
Hi T42,

Here is my configuration file.
The problem has been fixed, with the error being "not specifying the server name in the Oracle Enterprise Stage".


{
node "node0"
{
fastname "DBPU2"
pools ""
resource disk "/opt/tempdataset/" {pools ""}
resource scratchdisk "/opt/tempdataset/scratch/" {pools ""}
}

node "node1"
{
fastname "DBPU2"
pools ""
resource disk "/z03/tempdataset/" {pools ""}
resource scratchdisk "/z03/tempdataset/scratch/" {pools ""}
}

node "node2"
{
fastname "DBPU2"
pools ""
resource disk "/tmp/tempdataset/" {pools ""}
resource scratchdisk "/tmp/tempdataset/scratch/" {pools ""}
}

node "node3"
{
fastname "DBPU2"
pools ""
resource disk "/opt/tempdataset/" {pools ""}
resource scratchdisk "/opt/tempdataset/scratch/" {pools ""}
}


}

Posted: Fri Feb 04, 2005 12:08 pm
by dsxuserrio
Nelab
How was this dataset created?? Was it created using the same configuration file?? This kind of error frequently happens when the config file or datastet are ported from dev to test.
Thanks
dsxuserrio

Posted: Sun Feb 06, 2005 10:02 pm
by nelab28
Yes, The dataset was created using the same configuration file. There was no porting done, across projects or systems.

Well, while porting the projects, from dev to test/ prdn., checking on the configuration file, and to maintain consistencies will avoid any such errors.

But my problem was that, it was not reflecting the error message that "the db server" was not mentioned. Rather the log indicates relook at the configuration/installation.

Posted: Thu Feb 24, 2005 1:55 pm
by dsxuserrio
Nelab
What is z03 (node1 ) ?? Do you have a folder by that name or is it a typo.
Have you resolved your problem??
nelab28 wrote:
node "node1"
{
fastname "DBPU2"
pools ""
resource disk "/z03/tempdataset/" {pools ""}
resource scratchdisk "/z03/tempdataset/scratch/" {pools ""}
}