CopyTable1 - From source to target with copy stage
------------------------------------------------------------------
ORACLE--------->COPY------------->DATASET
CopyTable2 - Direct from source to target
-------------------------------------------------
ORACLE---------------------------------------->DATASET
And here are the scores generated
CopyTable1 - direct from source to target with copy stage
---------------------------------------------------------
main_program: This step has 3 datasets:
ds0: {op0[2p] (parallel srcCountries)
eAny=>eCollectAny
op1[2p] (parallel inCntry_outCntry)}
ds1: {op2[2p] (parallel delete data files in delete D:/Projects/Devl/scratch/coyTable1.ds)
>>eCollectAny
op3[1p] (sequential delete descriptor file in delete D:/Projects/Devl/scratch/coyTable1.ds)}
ds2: {op1[2p] (parallel inCntry_outCntry)
=>
D:/Projects/Devl/scratch/coyTable1.ds}
It has 4 operators:
op0[2p] {(parallel srcCountries)
on nodes (
node1[op0,p0]
node2[op0,p1]
)}
op1[2p] {(parallel inCntry_outCntry)
on nodes (
node1[op1,p0]
node2[op1,p1]
)}
op2[2p] {(parallel delete data files in delete D:/Projects/Devl/scratch/coyTable1.ds)
on nodes (
node1[op2,p0]
node2[op2,p1]
)}
op3[1p] {(sequential delete descriptor file in delete D:/Projects/Devl/scratch/coyTable1.ds)
on nodes (
node1[op3,p0]
)}
It runs 7 processes on 2 nodes.
CopyTable2 - direct from source to target
----------------------------------------
main_program: This step has 2 datasets:
ds0: {op1[2p] (parallel delete data files in delete D:/Projects/Devl/scratchcopyTable2.ds)
>>eCollectAny
op2[1p] (sequential delete descriptor file in delete D:/Projects/Devl/scratchcopyTable2.ds)}
ds1: {op0[2p] (parallel srcCountries)
=>
D:/Projects/Devl/scratchcopyTable2.ds}
It has 3 operators:
op0[2p] {(parallel srcCountries)
on nodes (
node1[op0,p0]
node2[op0,p1]
)}
op1[2p] {(parallel delete data files in delete D:/Projects/Devl/scratchcopyTable2.ds)
on nodes (
node1[op1,p0]
node2[op1,p1]
)}
op2[1p] {(sequential delete descriptor file in delete D:/Projects/Devl/scratchcopyTable2.ds)
on nodes (
node1[op2,p0]
)}
It runs 5 processes on 2 nodes.
Don't know where you read that. The Copy stage is compiled out irrespective if it's not needed (that is, makes an identical copy of its input). Force compile has nothing whatsoever to do with it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I stand corrected. It's compiled out if it's not needed AND Force Property is set to false.
(I'm not referring to force compile. It's the Force property in the Copy stage)
In this case, it's a straight copy. No changes to names, no columns dropped, no change to the order of columns.
The manual states
Where you are using a Copy stage with a single input and a single output, you should ensure that you set the Force property in the stage editor TRUE. This prevents WebSphere DataStage from deciding that the Copy operation is superfluous and optimizing it out of the job.
So I'm still curious to know why it isn't optimised out of the job.
Ah, the Force option in the Copy stage itself. Well, that's basically there so that you can require the compiler to include a copy operator even though one is not, strictly speaking, needed. Another way is to use the Copy stage to do something, even something trivial like re-naming a column. You say that this does not apply.
In your score for CopyTable1, I see that the copy operator (inCntry_outCntry) does appear to be present, which I would only expect to see if the Copy stage's Force option were set to True. And you say that it isn't. Would you mind checking?
Maybe the job compiler in version 8.x does things a little differently.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Tried it again just now after creating a completely new job on a different server (same os and ds version) but with 4 nodes and checking that the FORCE option on the copy stage is False.
Don't know. Can we see the generated OSH? I'm particularly interested in the record schemas - obsfuscate the column names if need be, but preserve the 1-1 relationship between real names and obfuscated names.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
The only thing I can suggest is that the Copy stage is kept because its output does not connect to a virtual Data Set. Indeed, the operator associated with the Data Set stage is copy which may mean that "your" Copy stage was eliminated, but there is a copy operator to transfer data to the Data Set. Test this theory by emplacing any stage type (except a Modify stage or another Copy stage) between the Copy stage and the Data Set stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.