We are migrating jobs from 7.5.1 to 8.7 Grid env.
Job Design
Oracle Connector -> Sort (2 key columns) -> Transformer (record comparision, transformations and 4 links out) -> Funnel -> Remove duplicates (based on 4 columns) -> Sequential File
Problem
1. Job processes about 120 million records and takes 8 hours to complete on a 2x2 grid env
2. Fills 85% of scratch space on head node
Is there anyway to avoid scratch disk fill up? I would like to know how to build scratch disk pools on the compute nodes.
Thank you
DataStage Sterling
85% Scratch disk usage on head node in Grid envorinment
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 26
- Joined: Wed Jul 17, 2013 9:00 am
Can you show the APT file as created by the job?
There should be setting in global_grid_values in your $GRIDHOME (or overwritten at your project level) that deals with executing on the conductor, can you tell us what that is (going off of memory on that one, might be wrong).
Did you disable the Head Node from accepting Grid jobs ?
If using Platform LSF type: bhosts
Then look to see if that Head Node is "closed"
There should be setting in global_grid_values in your $GRIDHOME (or overwritten at your project level) that deals with executing on the conductor, can you tell us what that is (going off of memory on that one, might be wrong).
Did you disable the Head Node from accepting Grid jobs ?
If using Platform LSF type: bhosts
Then look to see if that Head Node is "closed"
-
- Participant
- Posts: 26
- Joined: Wed Jul 17, 2013 9:00 am
Configuration File
Grid values from Job Log
Yes we are using platform LSF and the head node is not closed.
Code: Select all
IIS-DSEE-DYNG0014 <Dynamic_grid.sh>Information: SEQFILE Host(s): xxx_ServerName: xxx_ServerName:
{
node "Conductor"
{
fastname "zzz_ServerName"
pools "conductor"
resource disk "/opt/<resource disk>" {pools ""}
resource scratchdisk "/opt/<scratch disk>" {pools ""}
}
node "Compute1"
{
fastname "xxx_ServerName"
pools ""
resource disk "/opt/<resource disk>" {pools ""}
resource scratchdisk "/opt/<scratch disk>" {pools ""}
}
node "Compute2"
{
fastname "xxx_ServerName"
pools ""
resource disk "/opt/<resource disk>" {pools ""}
resource scratchdisk "/opt/<scratch disk>" {pools ""}
}
node "Compute3"
{
fastname "xxx_ServerName"
pools ""
resource disk "/opt/<resource disk>" {pools ""}
resource scratchdisk "/opt/<scratch disk>" {pools ""}
}
node "Compute4"
{
fastname "xxx_ServerName"
pools ""
resource disk "/opt/<resource disk>" {pools ""}
resource scratchdisk "/opt/<scratch disk>" {pools ""}
}
}
IIS-DSEE-OSHC0007 <osh_conductor>Information: Authorized to proceed.
Code: Select all
APT_GRID_COMPUTENODES=2
APT_GRID_CONFIG=
APT_GRID_ENABLE=YES
APT_GRID_IDENTIFIER=
APT_GRID_OPTS=
APT_GRID_PARTITIONS=2
APT_GRID_QUEUE=
APT_GRID_SCRIPTPOST=
APT_GRID_SCRIPTPRE=
APT_GRID_SEQFILE_HOST=
APT_GRID_SEQFILE_HOST2=
APT_GRID_STAT_CMD=
Last edited by DataStage_Sterling on Mon Mar 03, 2014 10:50 am, edited 1 time in total.
Well, your APT_GRID_CONFIG must be set, otherwise the grid enablement toolkit will use your defautl.apt, which is NOT grid friendly. Didn't IBM explain that to your admins?
That is why you ran out of scratch. It also looks like the scratch and datasets you've been using is under the tool installation mount. (which would also indicate that you may be using the default.apt)
I'm surprised you don't have APT_GRID_QUEUE also defined. Not a requirement, but a best practice to set, otherwise you'll be submitting to whatever the default queue is. Probably NORMAL queue.
PM me who from IBM services helped you guys set up that grid. I don't think you got your money's worth.
At least they put the Conductor node as Conductor pool and not blank.
Are you Platform LSF or Load Leveler?
That is why you ran out of scratch. It also looks like the scratch and datasets you've been using is under the tool installation mount. (which would also indicate that you may be using the default.apt)
I'm surprised you don't have APT_GRID_QUEUE also defined. Not a requirement, but a best practice to set, otherwise you'll be submitting to whatever the default queue is. Probably NORMAL queue.
PM me who from IBM services helped you guys set up that grid. I don't think you got your money's worth.
At least they put the Conductor node as Conductor pool and not blank.
Are you Platform LSF or Load Leveler?
-
- Participant
- Posts: 26
- Joined: Wed Jul 17, 2013 9:00 am
It seems that it was a local disk before. But for better maintenance and performance it was NFS mounted.lstsaur wrote:Scratch space of the compute nodes, minimun 25 GB, must be a local disk, not NAS-mounted or NFS-mount. It seems like your job's scratch processing is all done on the head node. No wonder job takes longer to finish.