I have two set of SCORE for a given job.
Code: Select all
Seq File --> Filter --> Dataset
For SeqFile --> Keep File partition = True
For Dataset : --> Partition = Auto
#2:main_program: This step has 3 datasets:
ds0: {op0[1p] (sequential Sequential_File_0)
[pp] eSame->eCollectAny op1[1p] (parallel Filter_26)}
ds1: {op2[3p] (parallel delete data files in delete /bis_data/datasets/dataset_test.ds)
>>eCollectAny
op3[1p] (sequential delete descriptor file in delete /bis_data/datasets/dataset_test.ds)}
ds2: {op1[1p] (parallel Filter_26)
[pp] ->
/bis_data/datasets/dataset_test.ds}
It has 4 operators:
op0[1p] {(sequential Sequential_File_0)
on nodes (
node1[op0,p0]
)}
op1[1p] {(parallel Filter_26)
on nodes (
node2[op1,p0]
)}
op2[3p] {(parallel delete data files in delete /bis_data/dataset_test.ds)
on nodes (
node1[op2,p0]
node2[op2,p1]
node3[op2,p2]
)}
op3[1p] {(sequential delete descriptor file in delete /bis_data/datasets/dataset_test.ds)
on nodes (
node1[op3,p0]
)}
It runs 6 processes on 3 nodes.
For SeqFile --> Keep File partition = False
For Dataset : --> Partition = Auto
Especially the bold points.main_program: This step has 3 datasets:
ds0: {op0[1p] (sequential Sequential_File_0)
eAny<>eCollectAny op1[3p] (parallel Filter_26)}
ds1: {op2[3p] (parallel delete data files in delete /bis_data/datasets/dataset_test.ds)
>>eCollectAny
op3[1p] (sequential delete descriptor file in delete /bis_data/datasets/dataset_test.ds)}
ds2: {op1[3p] (parallel Filter_26)
=>
/bis_data/datasets/dataset_test.ds}
It has 4 operators:
op0[1p] {(sequential Sequential_File_0)
on nodes (
node1[op0,p0]
)}
op1[3p] {(parallel Filter_26)
on nodes (
node1[op1,p0]
node2[op1,p1]
node3[op1,p2]
)}
op2[3p] {(parallel delete data files in delete /bis_data/datasets/dataset_test.ds)
on nodes (
node1[op2,p0]
node2[op2,p1]
node3[op2,p2]
)}
op3[1p] {(sequential delete descriptor file in delete /bis_data/datasets/dataset_test.ds)
on nodes (
node1[op3,p0]
)}
It runs 8 processes on 3 nodes.