Understanding SCORE

karthi_gana · Post by **karthi_gana** » Wed Mar 28, 2012 5:27 am

All,

I have two set of SCORE for a given job.

Seq File --> Filter --> Dataset

#1:

For SeqFile --> Keep File partition = True

For Dataset : --> Partition = Auto

main_program: This step has 3 datasets:
ds0: {op0[1p] (sequential Sequential_File_0)
[pp] eSame->eCollectAny op1[1p] (parallel Filter_26)}
ds1: {op2[3p] (parallel delete data files in delete /bis_data/datasets/dataset_test.ds)
>>eCollectAny
op3[1p] (sequential delete descriptor file in delete /bis_data/datasets/dataset_test.ds)}
ds2: {op1[1p] (parallel Filter_26)
[pp] ->
/bis_data/datasets/dataset_test.ds}
It has 4 operators:
op0[1p] {(sequential Sequential_File_0)
on nodes (
node1[op0,p0]
)}
op1[1p] {(parallel Filter_26)
on nodes (
node2[op1,p0]
)}
op2[3p] {(parallel delete data files in delete /bis_data/dataset_test.ds)
on nodes (
node1[op2,p0]
node2[op2,p1]
node3[op2,p2]
)}
op3[1p] {(sequential delete descriptor file in delete /bis_data/datasets/dataset_test.ds)
on nodes (
node1[op3,p0]
)}
It runs 6 processes on 3 nodes.

#2:

For SeqFile --> Keep File partition = False

For Dataset : --> Partition = Auto

main_program: This step has 3 datasets:
ds0: {op0[1p] (sequential Sequential_File_0)
eAny<>eCollectAny op1[3p] (parallel Filter_26)}
ds1: {op2[3p] (parallel delete data files in delete /bis_data/datasets/dataset_test.ds)
>>eCollectAny
op3[1p] (sequential delete descriptor file in delete /bis_data/datasets/dataset_test.ds)}
ds2: {op1[3p] (parallel Filter_26)
=>
/bis_data/datasets/dataset_test.ds}
It has 4 operators:
op0[1p] {(sequential Sequential_File_0)
on nodes (
node1[op0,p0]
)}
op1[3p] {(parallel Filter_26)
on nodes (
node1[op1,p0]
node2[op1,p1]
node3[op1,p2]
)}
op2[3p] {(parallel delete data files in delete /bis_data/datasets/dataset_test.ds)
on nodes (
node1[op2,p0]
node2[op2,p1]
node3[op2,p2]
)}
op3[1p] {(sequential delete descriptor file in delete /bis_data/datasets/dataset_test.ds)
on nodes (
node1[op3,p0]
)}
It runs 8 processes on 3 nodes.

Especially the bold points.

neilbeckwith · Post by **neilbeckwith** » Wed Mar 28, 2012 3:33 pm

What's the question ?

In #1, you have preserve partitioning set ([pp] in the score) i.e. the Sequential file is requesting that the Filter preserve sequential operation.
[pp] eSame->eCollectAny
shows this, -> means sequential to sequential, and
)}
op1[1p] {(parallel Filter_26)
on nodes (
node2[op1,p0]
)}
shows the Filter is doing so and running sequentially.

In #2, it is Auto partitioning (in this case Round Robin) and is running Sequential to Parallel.
eAny<>eCollectAny
shows this, <> means sequential to parallel, and
)}
op1[3p] {(parallel Filter_26)
on nodes (
node1[op1,p0]
node2[op1,p1]
node3[op1,p2]
)}
shows the Filter is running in parallel.

Therefor the 2 player process difference as reported.

qt_ky · Post by **qt_ky** » Wed Mar 28, 2012 6:14 pm

Filter_26 from the first example is running in parallel on a single node, which is effectively running sequentially.

ray.wurlod · Post by **ray.wurlod** » Wed Mar 28, 2012 9:12 pm

... which suggests that a node pool may be involved. Or that the Data Set was written previously using a different configuration file (with three nodes) and this job is running using a single node configuration file.