APT_DUMP_SCORE output

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1) LINKS (virtual Data Sets) are referred to as Data Sets. Persistent Data Sets are referred to as Data Sets. Files are not referred to as Data Sets.

2) ds0, ds1, ds2, op0, op1, op2 and op3 are logical names used internally: how they map onto design objects is specified in the score.
p0, p1, and so on refer to processes. Thus op1 p0 is the first (player) process executing op1, op1 p1 is the second (player) process executing op1. op1[2p] means that op1 will execute using two player processes.

3) eAny, eRoundRobin, etc. are partitioning algorithms (eAny means "Auto"); similarly eCollectAny, eCollectRoundRobin, etc., are collector algorithms. The symbols between them have meaning also, for example <> is sequential to parallel, => is parallel to parallel (Same), >> is parallel to parallel (other than Same), etc.

4) You can read about implied operators in the Orchestrate manuals. It is probably OK to think of the names as names DataStage gives to the generated operators when the mapping is not truly one-to-one between stage and operator, for example in the case of composite operators.

Enrol to take the IBM Advanced DataStage class (code DX436) or my Advanced Parallel Job Techniques class. Each has about a half day on interpreting the score.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

All the three sequential files are of single process which runs in sequential mode and the transformer runs in parallel mode and hence with processes.
Total of 5 processes.
Three links and hence, 3 virtual dataset.
Here it has 4 stages mapped to 4 operators. It may not be one to one at times.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

Ray and Kumar_s, thank you for your response. I have a few follow up questions:

1)So are all files, sequential and datasets, converted to virtual datasets?

3)So eAny <> eCollectAny means:
from Sequential TO Parallel AND
from Auto Partitioning TO Auto Collecting? That doesn't make sense because the flow is an open flower, which is, from regular to partitioning between these 2 stages (first sequential file stage and the parallel stage).
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

I have another job as follows:

SeqFile1 ---> RemoveDuplicates ----> SeqFile2

The data flow goes from regular to partitioned to regular.

Here is the score output:
1)main_program: This step has no datasets.
2)It has 1 operator:
3)op0[1p] {(sequential APT_CombinedOperatorController:
4) (APT_LicenseCountOp in APT_LicenseOperator)
5) (APT_LicenseCheckOp in APT_LicenseOperator)
6) ) on nodes (
7) node1[op0,p0]
8) )}
9)It runs 1 process on 1 node.

My questions are:
1)This job has 2 sequential file stages and as a result, 2 links. Shouldn't there be 2 datasets?

2)I thought each of the stages are operators? Shouldn't there be 3?

3)What is: sequential APT_CombinedOperatorController? Is it saying that the first sequential stage and the RemoveDuplicates stages are combined? Why does Datastage use the name APT_CombinedOperatorController?

9)Since the RemoveDuplicates runs parallelly, I would think that there should be 4 processes.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

abc123 wrote:Ray and Kumar_s, thank you for your response. I have a few follow up questions:

1)So are all files, sequential and datasets, converted to virtual datasets?

3)So eAny <> eCollectAny means:
from Sequential TO Parallel AND
from Auto Partitioning TO Auto Collecting? That doesn't make sense because the flow is an open flower, which is, from regular to partitioning between these 2 stages (first sequential file stage and the parallel stage).
1. The whole data been carried in the link as virtual dataset. And it becomes persistent only if its written on disk.
2.Not sure about this. We can wait till Ray get backs.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

abc123 wrote:I have another job as follows:

SeqFile1 ---> RemoveDuplicates ----> SeqFile2
Here due to combinabiltiy mode, all the operator got combined into to single operator under APT_CombinedOperatorController, it may be a sequential read with unique option on the key to avoid duplicates, and piped to output.
Pls set APT_DISABLE_COMBINATION to TRUE and rerun the job and post the result.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

main_program: This step has 1 dataset:
ds0: {op0[1p] (parallel APT_LicenseCountOp in APT_LicenseOperator)
->eCollectAny
op1[1p] (sequential APT_LicenseCheckOp in APT_LicenseOperator)}
It has 2 operators:
op0[1p] {(parallel APT_LicenseCountOp in APT_LicenseOperator)
on nodes (
node1[op0,p0]
)}
op1[1p] {(sequential APT_LicenseCheckOp in APT_LicenseOperator)
on nodes (
node2[op1,p0]
)}
It runs 2 processes on 2 nodes.
---------------------------------------------
The number of operators have increased from 1 to 2 and number or datasets have increased
from 0 to 1.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

eCollectAny has no effect when the receiving stage operates in parallel mode.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

Ray, could you answer the questions on lines 1,2,3,9 at the bottom of this post?
Thanks.

I have another job as follows:

SeqFile1 ---> RemoveDuplicates ----> SeqFile2

The data flow goes from regular to partitioned to regular.

Here is the score output:
1)main_program: This step has no datasets.
2)It has 1 operator:
3)op0[1p] {(sequential APT_CombinedOperatorController:
4) (APT_LicenseCountOp in APT_LicenseOperator)
5) (APT_LicenseCheckOp in APT_LicenseOperator)
6) ) on nodes (
7) node1[op0,p0]
)}
9)It runs 1 process on 1 node.

My questions are:
1)This job has 2 sequential file stages and as a result, 2 links. Shouldn't there be 2 datasets?

2)I thought each of the stages are operators? Shouldn't there be 3?

3)What is: sequential APT_CombinedOperatorController? Is it saying that the first sequential stage and the RemoveDuplicates stages are combined? Why does Datastage use the name APT_CombinedOperatorController?

9)Since the RemoveDuplicates runs parallelly, I would think that there should be 4 processes.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

I guess you are using Auto partition in Remove Duplicate stage. It might be using the SAME partition, i.e., Sequential mode to operated. "
->eCollectAny"
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply