Hi Ray,
Thanks for the reply. That helps in clearing up a few items.
A few additional question regarding the nodes especially, in the GRID world. I am trying to understand how some of the PX GRID parameters boil down into code during execution and would greatly appreciate your feedback if i am not correctly interpretting the log messages.
The configuration file that is generated dynamically by the resource manager emits the node sections according to the values that one specifies for the GRID environment variables.
If I specify
$APT_GRID_COMPUTENODES = 2 and
$APT_GRID_PARTITIONS = 4, then the dynamic config file contains sections as given below
In effect, it is what you have explained in your earlier response. Are these the entries one should look at to get an idea of how many nodes and instances are being launched in the back-end?
Code: Select all
<Dynamic_gird.sh> SEQFILE Host(s): ctpcqabdsc01p: ctpcqabdsc02p:
{
node "Conductor"
{
fastname "ctpcqabdsh01p"
pools "conductor"
resource disk "/nfsdata/data1/datasets" {pools ""}
resource scratchdisk "/scratch" {pools ""}
}
node "node1_1"
{
fastname "ctpcqabdsc01p"
pools ""
resource disk "/nfsdata/data1/datasets" {pools ""}
resource scratchdisk "/scratch" {pools ""}
}
node "node1_2"
{
fastname "ctpcqabdsc01p"
pools ""
resource disk "/nfsdata/data1/datasets" {pools ""}
resource scratchdisk "/scratch" {pools ""}
}
node "node1_3"
{
fastname "ctpcqabdsc01p"
pools ""
resource disk "/nfsdata/data1/datasets" {pools ""}
resource scratchdisk "/scratch" {pools ""}
}
node "node1_4"
{
fastname "ctpcqabdsc01p"
pools ""
resource disk "/nfsdata/data1/datasets" {pools ""}
resource scratchdisk "/scratch" {pools ""}
}
node "node2_1"
{
fastname "ctpcqabdsc02p"
pools ""
resource disk "/nfsdata/data1/datasets" {pools ""}
resource scratchdisk "/scratch" {pools ""}
}
node "node2_2"
{
fastname "ctpcqabdsc02p"
pools ""
resource disk "/nfsdata/data1/datasets" {pools ""}
resource scratchdisk "/scratch" {pools ""}
}
node "node2_3"
{
fastname "ctpcqabdsc02p"
pools ""
resource disk "/nfsdata/data1/datasets" {pools ""}
resource scratchdisk "/scratch" {pools ""}
}
node "node2_4"
{
fastname "ctpcqabdsc02p"
pools ""
resource disk "/nfsdata/data1/datasets" {pools ""}
resource scratchdisk "/scratch" {pools ""}
}
}
The debug information about datasets shows
Code: Select all
main_program: This step has 3 datasets:
ds0: {op0[1p] (sequential SF_RCPInputFile)
eAny<>eCollectAny
op1[8p] (parallel APT_TransformOperatorImplV0S1_GEMAsCollectedLoader_V2_Job_XFM_RCP_2_Database in XFM_RCP_2_Database)}
ds1: {op0[1p] (sequential SF_RCPInputFile)
->eCollectAny
op3[1p] (sequential APT_RealFileExportOperator in SF_Source_Rejects)}
ds2: {op1[8p] (parallel APT_TransformOperatorImplV0S1_GEMAsCollectedLoader_V2_Job_XFM_RCP_2_Database in XFM_RCP_2_Database)
eAny=>eCollectAny
op2[8p] (parallel ODB_AsCollected)}
It has 4 operators:
op0[1p] {(sequential SF_RCPInputFile)
on nodes (
Conductor[op0,p0]
)}
op1[8p] {(parallel APT_TransformOperatorImplV0S1_GEMAsCollectedLoader_V2_Job_XFM_RCP_2_Database in XFM_RCP_2_Database)
on nodes (
node1_1[op1,p0]
node1_2[op1,p1]
node1_3[op1,p2]
node1_4[op1,p3]
node2_1[op1,p4]
node2_2[op1,p5]
node2_3[op1,p6]
node2_4[op1,p7]
)}
op2[8p] {(parallel ODB_AsCollected)
on nodes (
node1_1[op2,p0]
node1_2[op2,p1]
node1_3[op2,p2]
node1_4[op2,p3]
node2_1[op2,p4]
node2_2[op2,p5]
node2_3[op2,p6]
node2_4[op2,p7]
)}
op3[1p] {(sequential APT_RealFileExportOperator in SF_Source_Rejects)
on nodes (
node1_1[op3,p0]
)}
It runs 18 processes on 9 nodes.
Further down the execution code in the director shows
Code: Select all
main_program: APT_PM_StartProgram: Locally - /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/standalone.sh /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine -APT_PMprotoSectionLeaderFlag --APTNoSetupProgram /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/bin/osh.exe -APT_PMsectionLeaderFlag ctpcqabdsh01p 10002 0 30 Conductor ctpcqabdsh01p 1212125620.477694.1de9 0 -os_charset UTF-8
APT_PM_StartProgram: Remotely - /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/remsh -n ctpcqabdsc01p /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/standalone.sh /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine -APT_PMprotoSectionLeaderFlag --APTNoSetupProgram /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/bin/osh.exe -APT_PMsectionLeaderFlag ctpcqabdsh01p 10002 1 30 node1_1 ctpcqabdsc01p 1212125620.477694.1de9 0 -os_charset UTF-8
APT_PM_StartProgram: Remotely - /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/remsh -n ctpcqabdsc01p /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/standalone.sh /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine -APT_PMprotoSectionLeaderFlag --APTNoSetupProgram /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/bin/osh.exe -APT_PMsectionLeaderFlag ctpcqabdsh01p 10002 2 30 node1_2 ctpcqabdsc01p 1212125620.477694.1de9 0 -os_charset UTF-8
...
...
...
truncated for brevity...
When these 2 nodes are working on this job, there is no way any other job can get into this node, correct?
Also, I have not utilized the "Multiple readers per node" in my design. Is it safe to assume that the input data is automatically partitioned by the engine into these 8 connections on the database?
Many thanks for your invaluable time,