Hi,
In my parallel job, use Row_Generator to create new rows, and use Peak to get output stream, but when I view the data, it return an error msg, it said that cannot contract one or more section leaders.
Then I compiled the job and ran it, but it aborted, the error msg is #main_program: The section leader on <DS_Server_Name> died.
First,I check the configeration file in manager, all the defualt file return an error msg, cannot contact one or more section leader.
And I search from internet, and to solve the problem, alter the environmental parameter APT_PM_NODE_TIMEOUT, set it to 300 but the cannot excute.
Then I alter the permission of file in linux system-/etc/hosts.xxxxx, the job still cannot ran.
So need your help.
Thanks
Best Regards
error msg about section leader
Moderators: chulett, rschirm, roy
and the error msg returned by the configuration file as below:
##I TFCN 000001 12:05:00(000) <main_program>
Ascential DataStage(tm) Enterprise Edition 7.5.2
Copyright (c) 2004, 1997-2004 Ascential Software Corporation.
All Rights Reserved
##I TOCK 000000 12:05:00(001) <main_program> OS charset: UTF-8.
##I TOCK 000000 12:05:00(002) <main_program> Input charset: UTF-8.
##I TFSC 000001 12:05:00(003) <main_program> APT configuration file: /home/dsadm/Ascential/DataStage/Configurations/default.apt
##I TFSC 000000 12:05:00(004) <main_program>
This step has 1 dataset:
ds0: {op0[2p] (parallel APT_CheckConfigOperator)
>>eCollectAny
op1[1p] (sequential APT_RealFileExportOperator in APT_FileExportOperator)}
It has 2 operators:
op0[2p] {(parallel APT_CheckConfigOperator)
on nodes (
node1[op0,p0]
node2[op0,p1]
)}
op1[1p] {(sequential APT_RealFileExportOperator in APT_FileExportOperator)
on nodes (
node1[op1,p0]
)}
It runs 3 processes on 2 nodes.
##W TFPM 000152 12:05:30(000) <main_program> Accept timed out retries = 8 [processmgr/newcontact.C:876]
##E TFPM 000153 12:05:30(001) <main_program> The section leader on DSEE died [processmgr/newcontact.C:885]
##E TFPM 000356 12:05:30(002) <main_program>
**** Parallel startup failed ****
This is usually due to a configuration error, such as
not having the Orchestrate install directory properly
mounted on all nodes, rsh permissions not correctly
set (via /etc/hosts.equiv or .rhosts), or running from
a directory that is not mounted on all nodes. Look for
error messages in the preceding output.
[processmgr/spawn.C:300]
##I TFPM 000177 12:05:30(003) <main_program> Step started on node DSEE; it uses 2 nodes.
The program running the step is /home/dsadm/Ascential/DataStage/PXEngine/bin/orchadmin.
##I TFPM 000178 12:05:30(004) <main_program> The ORCHESTRATE startup program in /home/dsadm/Ascential/DataStage/PXEngine/etc/standalone.sh is being used.
##I TFPM 000181 12:05:30(005) <main_program> A startup script is not being used.
##I TFPM 000183 12:05:30(006) <main_program> The TCP port being used for startup is 10,000; the associated socket number is 5.
##I TFPM 000184 12:05:30(007) <main_program>
Node status:
##I TFPM 000185 12:05:30(008) <main_program> DSEE -
##I TFPM 000187 12:05:30(009) <main_program> rsh issued, no response received
##I TFPM 000185 12:05:30(010) <main_program> DSEE -
##I TFPM 000187 12:05:30(011) <main_program> rsh issued, no response received
##E TFPM 000247 12:05:30(012) <main_program> Unable to contact one or more Section Leaders.
Probable configuration problem; contact Orchestrate system administrator.
[processmgr/pmpar.C:268]
##E TFSR 000011 12:05:30(013) <main_program> Step execution finished with status = FAILED. [sc/sc_api.C:252]
##E TOCK 000000 12:05:30(014) <main_program> ERROR: check configuration file failed. [check_config.C:102]
##I TFCN 000001 12:05:00(000) <main_program>
Ascential DataStage(tm) Enterprise Edition 7.5.2
Copyright (c) 2004, 1997-2004 Ascential Software Corporation.
All Rights Reserved
##I TOCK 000000 12:05:00(001) <main_program> OS charset: UTF-8.
##I TOCK 000000 12:05:00(002) <main_program> Input charset: UTF-8.
##I TFSC 000001 12:05:00(003) <main_program> APT configuration file: /home/dsadm/Ascential/DataStage/Configurations/default.apt
##I TFSC 000000 12:05:00(004) <main_program>
This step has 1 dataset:
ds0: {op0[2p] (parallel APT_CheckConfigOperator)
>>eCollectAny
op1[1p] (sequential APT_RealFileExportOperator in APT_FileExportOperator)}
It has 2 operators:
op0[2p] {(parallel APT_CheckConfigOperator)
on nodes (
node1[op0,p0]
node2[op0,p1]
)}
op1[1p] {(sequential APT_RealFileExportOperator in APT_FileExportOperator)
on nodes (
node1[op1,p0]
)}
It runs 3 processes on 2 nodes.
##W TFPM 000152 12:05:30(000) <main_program> Accept timed out retries = 8 [processmgr/newcontact.C:876]
##E TFPM 000153 12:05:30(001) <main_program> The section leader on DSEE died [processmgr/newcontact.C:885]
##E TFPM 000356 12:05:30(002) <main_program>
**** Parallel startup failed ****
This is usually due to a configuration error, such as
not having the Orchestrate install directory properly
mounted on all nodes, rsh permissions not correctly
set (via /etc/hosts.equiv or .rhosts), or running from
a directory that is not mounted on all nodes. Look for
error messages in the preceding output.
[processmgr/spawn.C:300]
##I TFPM 000177 12:05:30(003) <main_program> Step started on node DSEE; it uses 2 nodes.
The program running the step is /home/dsadm/Ascential/DataStage/PXEngine/bin/orchadmin.
##I TFPM 000178 12:05:30(004) <main_program> The ORCHESTRATE startup program in /home/dsadm/Ascential/DataStage/PXEngine/etc/standalone.sh is being used.
##I TFPM 000181 12:05:30(005) <main_program> A startup script is not being used.
##I TFPM 000183 12:05:30(006) <main_program> The TCP port being used for startup is 10,000; the associated socket number is 5.
##I TFPM 000184 12:05:30(007) <main_program>
Node status:
##I TFPM 000185 12:05:30(008) <main_program> DSEE -
##I TFPM 000187 12:05:30(009) <main_program> rsh issued, no response received
##I TFPM 000185 12:05:30(010) <main_program> DSEE -
##I TFPM 000187 12:05:30(011) <main_program> rsh issued, no response received
##E TFPM 000247 12:05:30(012) <main_program> Unable to contact one or more Section Leaders.
Probable configuration problem; contact Orchestrate system administrator.
[processmgr/pmpar.C:268]
##E TFSR 000011 12:05:30(013) <main_program> Step execution finished with status = FAILED. [sc/sc_api.C:252]
##E TOCK 000000 12:05:30(014) <main_program> ERROR: check configuration file failed. [check_config.C:102]
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: