error msg about section leader

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dxwwq557
Participant
Posts: 16
Joined: Wed Nov 10, 2010 9:19 pm
Location: Chengdu

error msg about section leader

Post by dxwwq557 »

Hi,

In my parallel job, use Row_Generator to create new rows, and use Peak to get output stream, but when I view the data, it return an error msg, it said that cannot contract one or more section leaders.

Then I compiled the job and ran it, but it aborted, the error msg is #main_program: The section leader on <DS_Server_Name> died.

First,I check the configeration file in manager, all the defualt file return an error msg, cannot contact one or more section leader.

And I search from internet, and to solve the problem, alter the environmental parameter APT_PM_NODE_TIMEOUT, set it to 300 but the cannot excute.

Then I alter the permission of file in linux system-/etc/hosts.xxxxx, the job still cannot ran.

So need your help.

Thanks
Best Regards
dxwwq557
Participant
Posts: 16
Joined: Wed Nov 10, 2010 9:19 pm
Location: Chengdu

Post by dxwwq557 »

and the error msg returned by the configuration file as below:


##I TFCN 000001 12:05:00(000) <main_program>
Ascential DataStage(tm) Enterprise Edition 7.5.2
Copyright (c) 2004, 1997-2004 Ascential Software Corporation.
All Rights Reserved


##I TOCK 000000 12:05:00(001) <main_program> OS charset: UTF-8.
##I TOCK 000000 12:05:00(002) <main_program> Input charset: UTF-8.
##I TFSC 000001 12:05:00(003) <main_program> APT configuration file: /home/dsadm/Ascential/DataStage/Configurations/default.apt
##I TFSC 000000 12:05:00(004) <main_program>
This step has 1 dataset:
ds0: {op0[2p] (parallel APT_CheckConfigOperator)
>>eCollectAny
op1[1p] (sequential APT_RealFileExportOperator in APT_FileExportOperator)}

It has 2 operators:
op0[2p] {(parallel APT_CheckConfigOperator)
on nodes (
node1[op0,p0]
node2[op0,p1]
)}
op1[1p] {(sequential APT_RealFileExportOperator in APT_FileExportOperator)
on nodes (
node1[op1,p0]
)}
It runs 3 processes on 2 nodes.
##W TFPM 000152 12:05:30(000) <main_program> Accept timed out retries = 8 [processmgr/newcontact.C:876]
##E TFPM 000153 12:05:30(001) <main_program> The section leader on DSEE died [processmgr/newcontact.C:885]
##E TFPM 000356 12:05:30(002) <main_program>

**** Parallel startup failed ****

This is usually due to a configuration error, such as
not having the Orchestrate install directory properly
mounted on all nodes, rsh permissions not correctly
set (via /etc/hosts.equiv or .rhosts), or running from
a directory that is not mounted on all nodes. Look for
error messages in the preceding output.

[processmgr/spawn.C:300]
##I TFPM 000177 12:05:30(003) <main_program> Step started on node DSEE; it uses 2 nodes.
The program running the step is /home/dsadm/Ascential/DataStage/PXEngine/bin/orchadmin.

##I TFPM 000178 12:05:30(004) <main_program> The ORCHESTRATE startup program in /home/dsadm/Ascential/DataStage/PXEngine/etc/standalone.sh is being used.

##I TFPM 000181 12:05:30(005) <main_program> A startup script is not being used.

##I TFPM 000183 12:05:30(006) <main_program> The TCP port being used for startup is 10,000; the associated socket number is 5.

##I TFPM 000184 12:05:30(007) <main_program>
Node status:


##I TFPM 000185 12:05:30(008) <main_program> DSEE -
##I TFPM 000187 12:05:30(009) <main_program> rsh issued, no response received

##I TFPM 000185 12:05:30(010) <main_program> DSEE -
##I TFPM 000187 12:05:30(011) <main_program> rsh issued, no response received


##E TFPM 000247 12:05:30(012) <main_program> Unable to contact one or more Section Leaders.
Probable configuration problem; contact Orchestrate system administrator.
[processmgr/pmpar.C:268]
##E TFSR 000011 12:05:30(013) <main_program> Step execution finished with status = FAILED. [sc/sc_api.C:252]
##E TOCK 000000 12:05:30(014) <main_program> ERROR: check configuration file failed. [check_config.C:102]
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Have you checked that rsh has been configured to allow password-less login?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dxwwq557
Participant
Posts: 16
Joined: Wed Nov 10, 2010 9:19 pm
Location: Chengdu

Post by dxwwq557 »

ray.wurlod wrote:Have you checked that rsh has been configured to allow password-less login? ...
how to check this? thanks.
dxwwq557
Participant
Posts: 16
Joined: Wed Nov 10, 2010 9:19 pm
Location: Chengdu

Post by dxwwq557 »

I have solved the problem, in the /etc/hosts file the ip address set incorrect. so it cannot find the correct server.
Post Reply