Page 1 of 1

Posted: Wed Feb 28, 2007 12:06 am
by ray.wurlod
1) LINKS (virtual Data Sets) are referred to as Data Sets. Persistent Data Sets are referred to as Data Sets. Files are not referred to as Data Sets.

2) ds0, ds1, ds2, op0, op1, op2 and op3 are logical names used internally: how they map onto design objects is specified in the score.
p0, p1, and so on refer to processes. Thus op1 p0 is the first (player) process executing op1, op1 p1 is the second (player) process executing op1. op1[2p] means that op1 will execute using two player processes.

3) eAny, eRoundRobin, etc. are partitioning algorithms (eAny means "Auto"); similarly eCollectAny, eCollectRoundRobin, etc., are collector algorithms. The symbols between them have meaning also, for example <> is sequential to parallel, => is parallel to parallel (Same), >> is parallel to parallel (other than Same), etc.

4) You can read about implied operators in the Orchestrate manuals. It is probably OK to think of the names as names DataStage gives to the generated operators when the mapping is not truly one-to-one between stage and operator, for example in the case of composite operators.

Enrol to take the IBM Advanced DataStage class (code DX436) or my Advanced Parallel Job Techniques class. Each has about a half day on interpreting the score.

Posted: Wed Feb 28, 2007 7:24 am
by kumar_s
All the three sequential files are of single process which runs in sequential mode and the transformer runs in parallel mode and hence with processes.
Total of 5 processes.
Three links and hence, 3 virtual dataset.
Here it has 4 stages mapped to 4 operators. It may not be one to one at times.

Posted: Wed Feb 28, 2007 10:24 am
by abc123
Ray and Kumar_s, thank you for your response. I have a few follow up questions:

1)So are all files, sequential and datasets, converted to virtual datasets?

3)So eAny <> eCollectAny means:
from Sequential TO Parallel AND
from Auto Partitioning TO Auto Collecting? That doesn't make sense because the flow is an open flower, which is, from regular to partitioning between these 2 stages (first sequential file stage and the parallel stage).

Posted: Wed Feb 28, 2007 10:52 am
by abc123
I have another job as follows:

SeqFile1 ---> RemoveDuplicates ----> SeqFile2

The data flow goes from regular to partitioned to regular.

Here is the score output:
1)main_program: This step has no datasets.
2)It has 1 operator:
3)op0[1p] {(sequential APT_CombinedOperatorController:
4) (APT_LicenseCountOp in APT_LicenseOperator)
5) (APT_LicenseCheckOp in APT_LicenseOperator)
6) ) on nodes (
7) node1[op0,p0]
8) )}
9)It runs 1 process on 1 node.

My questions are:
1)This job has 2 sequential file stages and as a result, 2 links. Shouldn't there be 2 datasets?

2)I thought each of the stages are operators? Shouldn't there be 3?

3)What is: sequential APT_CombinedOperatorController? Is it saying that the first sequential stage and the RemoveDuplicates stages are combined? Why does Datastage use the name APT_CombinedOperatorController?

9)Since the RemoveDuplicates runs parallelly, I would think that there should be 4 processes.

Posted: Wed Feb 28, 2007 1:32 pm
by kumar_s
abc123 wrote:Ray and Kumar_s, thank you for your response. I have a few follow up questions:

1)So are all files, sequential and datasets, converted to virtual datasets?

3)So eAny <> eCollectAny means:
from Sequential TO Parallel AND
from Auto Partitioning TO Auto Collecting? That doesn't make sense because the flow is an open flower, which is, from regular to partitioning between these 2 stages (first sequential file stage and the parallel stage).
1. The whole data been carried in the link as virtual dataset. And it becomes persistent only if its written on disk.
2.Not sure about this. We can wait till Ray get backs.

Posted: Wed Feb 28, 2007 1:37 pm
by kumar_s
abc123 wrote:I have another job as follows:

SeqFile1 ---> RemoveDuplicates ----> SeqFile2
Here due to combinabiltiy mode, all the operator got combined into to single operator under APT_CombinedOperatorController, it may be a sequential read with unique option on the key to avoid duplicates, and piped to output.
Pls set APT_DISABLE_COMBINATION to TRUE and rerun the job and post the result.

Posted: Wed Feb 28, 2007 2:26 pm
by abc123
main_program: This step has 1 dataset:
ds0: {op0[1p] (parallel APT_LicenseCountOp in APT_LicenseOperator)
->eCollectAny
op1[1p] (sequential APT_LicenseCheckOp in APT_LicenseOperator)}
It has 2 operators:
op0[1p] {(parallel APT_LicenseCountOp in APT_LicenseOperator)
on nodes (
node1[op0,p0]
)}
op1[1p] {(sequential APT_LicenseCheckOp in APT_LicenseOperator)
on nodes (
node2[op1,p0]
)}
It runs 2 processes on 2 nodes.
---------------------------------------------
The number of operators have increased from 1 to 2 and number or datasets have increased
from 0 to 1.

Posted: Wed Feb 28, 2007 3:36 pm
by ray.wurlod
eCollectAny has no effect when the receiving stage operates in parallel mode.

Posted: Wed Feb 28, 2007 4:09 pm
by abc123
Ray, could you answer the questions on lines 1,2,3,9 at the bottom of this post?
Thanks.

I have another job as follows:

SeqFile1 ---> RemoveDuplicates ----> SeqFile2

The data flow goes from regular to partitioned to regular.

Here is the score output:
1)main_program: This step has no datasets.
2)It has 1 operator:
3)op0[1p] {(sequential APT_CombinedOperatorController:
4) (APT_LicenseCountOp in APT_LicenseOperator)
5) (APT_LicenseCheckOp in APT_LicenseOperator)
6) ) on nodes (
7) node1[op0,p0]
)}
9)It runs 1 process on 1 node.

My questions are:
1)This job has 2 sequential file stages and as a result, 2 links. Shouldn't there be 2 datasets?

2)I thought each of the stages are operators? Shouldn't there be 3?

3)What is: sequential APT_CombinedOperatorController? Is it saying that the first sequential stage and the RemoveDuplicates stages are combined? Why does Datastage use the name APT_CombinedOperatorController?

9)Since the RemoveDuplicates runs parallelly, I would think that there should be 4 processes.

Posted: Wed Feb 28, 2007 5:43 pm
by kumar_s
I guess you are using Auto partition in Remove Duplicate stage. It might be using the SAME partition, i.e., Sequential mode to operated. "
->eCollectAny"