Understanding SCORE

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Understanding SCORE

Post by karthi_gana »

All,

I have two set of SCORE for a given job.

Code: Select all

Seq File --> Filter --> Dataset
#1:

For SeqFile --> Keep File partition = True

For Dataset : --> Partition = Auto
main_program: This step has 3 datasets:
ds0: {op0[1p] (sequential Sequential_File_0)
[pp] eSame->eCollectAny op1[1p] (parallel Filter_26)}
ds1: {op2[3p] (parallel delete data files in delete /bis_data/datasets/dataset_test.ds)
>>eCollectAny
op3[1p] (sequential delete descriptor file in delete /bis_data/datasets/dataset_test.ds)}
ds2: {op1[1p] (parallel Filter_26)
[pp] ->
/bis_data/datasets/dataset_test.ds}
It has 4 operators:
op0[1p] {(sequential Sequential_File_0)
on nodes (
node1[op0,p0]
)}
op1[1p] {(parallel Filter_26)
on nodes (
node2[op1,p0]
)}
op2[3p] {(parallel delete data files in delete /bis_data/dataset_test.ds)
on nodes (
node1[op2,p0]
node2[op2,p1]
node3[op2,p2]
)}
op3[1p] {(sequential delete descriptor file in delete /bis_data/datasets/dataset_test.ds)
on nodes (
node1[op3,p0]
)}
It runs 6 processes on 3 nodes.
#2:

For SeqFile --> Keep File partition = False

For Dataset : --> Partition = Auto
main_program: This step has 3 datasets:
ds0: {op0[1p] (sequential Sequential_File_0)
eAny<>eCollectAny op1[3p] (parallel Filter_26)}
ds1: {op2[3p] (parallel delete data files in delete /bis_data/datasets/dataset_test.ds)
>>eCollectAny
op3[1p] (sequential delete descriptor file in delete /bis_data/datasets/dataset_test.ds)}
ds2: {op1[3p] (parallel Filter_26)
=>
/bis_data/datasets/dataset_test.ds}
It has 4 operators:
op0[1p] {(sequential Sequential_File_0)
on nodes (
node1[op0,p0]
)}
op1[3p] {(parallel Filter_26)
on nodes (
node1[op1,p0]
node2[op1,p1]
node3[op1,p2]
)}
op2[3p] {(parallel delete data files in delete /bis_data/datasets/dataset_test.ds)
on nodes (
node1[op2,p0]
node2[op2,p1]
node3[op2,p2]
)}
op3[1p] {(sequential delete descriptor file in delete /bis_data/datasets/dataset_test.ds)
on nodes (
node1[op3,p0]
)}
It runs 8 processes on 3 nodes.
Especially the bold points.
Karthik
neilbeckwith
Participant
Posts: 6
Joined: Wed Oct 23, 2002 11:10 pm
Location: Melbourne, Australia
Contact:

Post by neilbeckwith »

What's the question ?

In #1, you have preserve partitioning set ([pp] in the score) i.e. the Sequential file is requesting that the Filter preserve sequential operation.
[pp] eSame->eCollectAny
shows this, -> means sequential to sequential, and
)}
op1[1p] {(parallel Filter_26)
on nodes (
node2[op1,p0]
)}

shows the Filter is doing so and running sequentially.

In #2, it is Auto partitioning (in this case Round Robin) and is running Sequential to Parallel.
eAny<>eCollectAny
shows this, <> means sequential to parallel, and
)}
op1[3p] {(parallel Filter_26)
on nodes (
node1[op1,p0]
node2[op1,p1]
node3[op1,p2]
)}

shows the Filter is running in parallel.

Therefor the 2 player process difference as reported.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Filter_26 from the first example is running in parallel on a single node, which is effectively running sequentially. :wink:
Choose a job you love, and you will never have to work a day in your life. - Confucius
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... which suggests that a node pool may be involved. Or that the Data Set was written previously using a different configuration file (with three nodes) and this job is running using a single node configuration file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply