Configuration-N-partitioning questions
Posted: Wed Dec 12, 2012 5:58 pm
Some questions are bothering me and after spending some time, I thought I should ask you guys for enlightenment. It will help me immensely.
No these are not interview stumpers.
I am told that new configuration might have have different output (assume if developer used AUTO partitioning) even if we run on ONE node (with 4 resource and scratch disk in apt file) since data is partitioned in 4 scratch and resource disk space.
Notice new configuration have separate disk and resource space while Legacy system has one directory (ISAPP) for scratch and disk space.
My questions are:
-------------------
1) is it true that even with one node, we may have different results since data is partitioned in four resource and scratch disk, assume data is AUTO partition. In other words, I thought having separate resource and scratch disk is for efficiency not impacting data partitioning, and number of nodes usually impact results? So if we update APT file to only One node, and still use 4 resource/scratch disks since we have one node, it will be sequential regardless of number of resource and scratch disk used, am I correct?
2) IBM best practice guide say use Auto to let system decide best partioning option but from experience I know this option doesn't always turn out right results. What do you guys recommend?
3) which of the following Job is efficient, and follows best practice?
Job A is designed with Join stage, and inside Join stage we use Hash partitioning on key and then sort it on link before joining.
Job B is designed with sort stage just before join stage, where we sort the output before going into the join stage. in Join stage, we Hash sort it.
4) When we look at DUMP score, we see tsort operator is inserted in some stages. One experienced fellow told me even though score tells you that, don't trust it because data may not be sorted properly, he recommended not to use AUTO partitioning, and use use other partitioning options based on Data and then sort it yourself.
Following relates to question# 1
Legacy configuration:
2 node one server environment running AIX and DS 8.1
New cluster configuration:
two server, 4 nodes each, config file may not use all nodes. One is primary server and has local scratch space, resource disk, WAS, and DB2 and other is compute server with only IIS engine and local scratch space, disk space is mounted, running AIX and DS 8.7.
No these are not interview stumpers.
I am told that new configuration might have have different output (assume if developer used AUTO partitioning) even if we run on ONE node (with 4 resource and scratch disk in apt file) since data is partitioned in 4 scratch and resource disk space.
Notice new configuration have separate disk and resource space while Legacy system has one directory (ISAPP) for scratch and disk space.
My questions are:
-------------------
1) is it true that even with one node, we may have different results since data is partitioned in four resource and scratch disk, assume data is AUTO partition. In other words, I thought having separate resource and scratch disk is for efficiency not impacting data partitioning, and number of nodes usually impact results? So if we update APT file to only One node, and still use 4 resource/scratch disks since we have one node, it will be sequential regardless of number of resource and scratch disk used, am I correct?
2) IBM best practice guide say use Auto to let system decide best partioning option but from experience I know this option doesn't always turn out right results. What do you guys recommend?
3) which of the following Job is efficient, and follows best practice?
Job A is designed with Join stage, and inside Join stage we use Hash partitioning on key and then sort it on link before joining.
Job B is designed with sort stage just before join stage, where we sort the output before going into the join stage. in Join stage, we Hash sort it.
4) When we look at DUMP score, we see tsort operator is inserted in some stages. One experienced fellow told me even though score tells you that, don't trust it because data may not be sorted properly, he recommended not to use AUTO partitioning, and use use other partitioning options based on Data and then sort it yourself.
Following relates to question# 1
Legacy configuration:
2 node one server environment running AIX and DS 8.1
Code: Select all
{
node "node0"
{
fastname "primary"
pools ""
resource disk "/isApp0/dataset/project#" {pools ""}
resource disk "/isApp1/dataset/project#" {pools ""}
resource scratchdisk "/isApp0/Scratch/project#" {pools ""}
resource scratchdisk "/isApp1/Scratch/project#" {pools ""}
}
node "node0"
{
fastname "primary"
pools ""
resource disk "/isApp0/dataset/project#" {pools ""}
resource disk "/isApp1/dataset/project#" {pools ""}
resource scratchdisk "/isApp0/Scratch/project#" {pools ""}
resource scratchdisk "/isApp1/Scratch/project#" {pools ""}
}
}
two server, 4 nodes each, config file may not use all nodes. One is primary server and has local scratch space, resource disk, WAS, and DB2 and other is compute server with only IIS engine and local scratch space, disk space is mounted, running AIX and DS 8.7.
Code: Select all
{
node "node1"
{
fastname "primary"
pools ""
resource disk "/isdataset0/dataset/project#" {pools ""}
resource disk "/isdataset1/dataset/project#" {pools ""}
resource disk "/isdataset2/dataset/project#" {pools ""}
resource disk "/isdataset3/dataset/project#" {pools ""}
resource scratchdisk "/isscratch0/Scratch/project#" {pools ""}
resource scratchdisk "/isscratch1/Scratch/project#" {pools ""}
resource scratchdisk "/isscratch2/Scratch/project#" {pools ""}
resource scratchdisk "/isscratch3/Scratch/project#" {pools ""}
}
node "node2"
{
fastname "compute"
pools ""
resource disk "/isdataset0/dataset/project#" {pools ""}
resource disk "/isdataset1/dataset/project#" {pools ""}
resource disk "/isdataset2/dataset/project#" {pools ""}
resource disk "/isdataset3/dataset/project#" {pools ""}
resource scratchdisk "/isscratch0/Scratch/project#" {pools ""}
resource scratchdisk "/isscratch1/Scratch/project#" {pools ""}
resource scratchdisk "/isscratch2/Scratch/project#" {pools ""}
resource scratchdisk "/isscratch3/Scratch/project#" {pools ""}
}
}