Page 1 of 1

default.apt

Posted: Fri Apr 04, 2008 7:42 am
by igorbmartins
Friends, someone could me explain how to configure the file default.apt. I would like to configure 4 nodes.


I found the example below. You could explain me what each line means? What is N1, S1, S2, app1, app2 and bigdata? Because some nodes are configured different from the others?

EXEMPLE:

{
node "n1" {
fastname "s1"
pool "" "n1" "s1" "app2" "sort"
resource disk "/orch/n1/d1" {}
resource disk "/orch/n1/d2" {"bigdata"}
resource scratchdisk "/temp" {"sort"}
}
node "n2" {
fastname "s2"
pool "" "n2" "s2" "app1"
resource disk "/orch/n2/d1" {}
resource disk "/orch/n2/d2" {"bigdata"}
resource scratchdisk "/temp" {}
}
node "n3" {
fastname "s3"
pool "" "n3" "s3" "app1"
resource disk "/orch/n3/d1" {}
resource scratchdisk "/temp" {}
}
node "n4" {
fastname "s4"
pool "" "n4" "s4" "app1"
resource disk "/orch/n4/d1" {}
resource scratchdisk "/temp" {} }
}

Thanks

Igor Bastos Martins
http://www.oportunidadesembi.com.br

Posted: Fri Apr 04, 2008 4:10 pm
by ray.wurlod
node is a unique name for each (logical) processing name; it is used primarily in reporting error messages - internally DataStage uses node #0, node #1 and so on.

fastname is the network node name of the machine on which that processing node will execute. If you are only using one machine the value of fastname must be the name of that machine in each node specification.

app1, app2, sort and so on are examples of "node pool" names; stages can be constrained to execute in a node pool (a subset of available processing nodes) - some stages do so automatically, for example the Sort stage seeks out a node pool called "sort".

bigdata is an example of a "disk pool" name. It's a similar concept to node pool - you can specify that a stage uses the directories mentioned in a disk pool.

It is a good idea to configure a large amount of scratchdisk.

is this config right?

Posted: Tue Apr 29, 2008 3:14 pm
by vdr123
Do you guys see anything wrong in this default.apt.
I tried to run seq gen stage in parallel and it gives me duplicate keys.

node "node1"
{
fastname "SMIMetlt001mgt"
pools ""
resource disk "/data/DataStage/datasets" {pools ""}
resource scratchdisk "/data/DataStage/scratch" {pools ""}
}
node "node2"
{
fastname "SMIMetlt001mgt"
pools ""
resource disk "/data/DataStage/datasets" {pools ""}
resource scratchdisk "/data/DataStage/scratch" {pools ""}
}

Posted: Wed Apr 30, 2008 12:12 am
by BugFree
The duplicates are because the stage is started on 2 nodes and each node will start from the 0 or 1 (not sure which) and continue. So if you run on 2 node you will get 2 duplicates and 4 node 4 duplicates.

Posted: Wed Apr 30, 2008 1:31 am
by ray.wurlod
Unless, of course, you make the increment something like the number of partitions.

Posted: Wed Apr 30, 2008 3:52 am
by Minhajuddin
Use the surrogate key generator stage to generate your keys or use the system variables which give you the Partition number and the number of partitions to get keys which are not duplicates.