Page 1 of 1

How to use Copy stage Force property

Posted: Mon Apr 22, 2013 7:31 am
by Enzopre
hi at all

I have a question for you about copy stage.

From the documentation it look like that if the Force property is set to TRUE (FORCE=TRUE) then datatage do no optimize out the
job removing the copy operator at compile time, instead if the Forse property is set to FALSE (FORCE=FALSE) then datastage can
decide if optimize or no the job removing the copy operator.

Now, what is not clears is when do we set FORCE=TRUE OR FORCE=FALSE? and Why? More precisely, what happeans and why if:

1) we have One input link, one output link AND FORCE=TRUE

2) we have one input link, one output link AND FORCE=FALSE

3) we have one input link, more output link AND FORCE=TRUE

4) we have one input link, more output link AND FORCE=FALSE

regards.

Posted: Mon Apr 22, 2013 7:45 am
by chulett
Simple enough to test that yourself and see.

Posted: Mon Apr 22, 2013 8:00 am
by Enzopre
where and what should I to see precisely?

I have already tested all four cases and I seen, for each case, in the OSH script generated and job score but I do not perceive changes. The copy operator is present in all four cases.

Posted: Mon Apr 22, 2013 8:27 am
by ssreeni3
How many OutPuts?
One or many?

-----------------------
Thanks,
Ssreeni3

Posted: Mon Apr 22, 2013 8:39 am
by Enzopre
What do you means?
As already I said I tested all four cases above mentioned and I do not perceive changes ......the copy operator is present in all four cases.

Posted: Mon Apr 22, 2013 10:13 am
by priyadarshikunal
You have to look at job score to see the difference. If you have multiple output from the copy stage, You may not find any difference.

Also the optimizer will not change your job design, but the way it executes hence the job score is where you should look for.

Posted: Mon Apr 22, 2013 11:02 am
by priyadarshikunal
Also, you might need to set APT_DUMP_SCORE to see the job score. Which will make the score visible in the logs.

Posted: Mon Apr 22, 2013 4:36 pm
by ray.wurlod
Just looking at the row counts ("performance statistics") in Designer will give you a clue. If there are row counts both sides of the Copy stage, then it's in the design and in the runtime.

Posted: Tue Apr 23, 2013 1:36 pm
by Enzopre
So, I have tried to re-compile and re-run in the two following cases:

1) One input link, one output link AND FORCE=FALSE

in this case the OSH script generated at compile time is the following:

Code: Select all

# OSH / orchestrate script for Job Copy_Stage compiled at 09.54.30 23 apr 2013

#################################################################
#### STAGE: Copy_0
## Operator
copy
## General options
[ident('Copy_0'); jobmon_ident('Copy_0')]
## Inputs
0< [] 'Sequential_File_1:DSLink3.v'
## Outputs
0> [modify (
keep
  CUSTOMER_NUMBER,DATE_1,DATE_2,DATE_3;
)] 'Copy_0:DSLink4.v'
;


#################################################################
#### STAGE: Sequential_File_1
## Operator
import
## Operator options
-schema record
  {final_delim=end, record_delim_string='\r\n', delim=',', quote=double}
(
  CUSTOMER_NUMBER:string[max=10];
  DATE_1:date;
  DATE_2:date;
  DATE_3:date;
)
-file  'C:\\DS_DISK\\BI_CORSO\\TutorialIBMVin\\DataStage87TutorialFiles\\GlobalCo_BillTo_CONVDATE.txt'
-rejects continue
-reportProgress yes
## General options
[ident('Sequential_File_1'); jobmon_ident('Sequential_File_1')]
## Outputs
0> [] 'Sequential_File_1:DSLink3.v'
;


#################################################################
#### STAGE: Sequential_File_2
## Operator
export
## Operator options
-schema record
  {final_delim=end, record_delim_string='\r\n', delim=',', quote=double}
(
  CUSTOMER_NUMBER:string[max=10];
  DATE_1:date;
  DATE_2:date;
  DATE_3:date;
)
-file 'C:\\DS_DISK\\BI_CORSO\\TutorialIBMVin\\copy_stage.txt'
-overwrite
-rejects continue
## General options
[ident('Sequential_File_2'); jobmon_ident('Sequential_File_2')]
## Inputs
0< [] 'Copy_0:DSLink4.v'
;
# End of OSH code
and the Job SCORE DUMP generated at run time is the following:

Code: Select all


main_program: This step has 2 datasets:

ds0: {op0[1p] (sequential Sequential_File_1)
      eAny<>eCollectAny
      op1[2p] (parallel Copy_0)}

ds1: {op1[2p] (parallel Copy_0)
      >>eCollectAny
      op2[1p] (sequential APT_RealFileExportOperator in Sequential_File_2)}

It has 3 operators:

op0[1p] {(sequential Sequential_File_1)
    on nodes (
      node1[op0,p0]
    )}


op1[2p] {(parallel Copy_0)
    on nodes (
      node1[op1,p0]
      node2[op1,p1]
    )}


op2[1p] {(sequential APT_RealFileExportOperator in Sequential_File_2)
    on nodes (
      node2[op2,p0]
    )}

It runs 4 processes on 2 nodes.
2) one link unput, one link output AND FORCE=TRUE

in this case the OSH script generated at compile time is the following:

Code: Select all

# OSH / orchestrate script for Job Copy_Stage compiled at 10.00.35 23 apr 2013
#################################################################
#### STAGE: Copy_0
## Operator
copy
## Operator options
-force
## General options
[ident('Copy_0'); jobmon_ident('Copy_0')]
## Inputs
0< [] 'Sequential_File_1:DSLink3.v'
## Outputs
0> [modify (
keep
  CUSTOMER_NUMBER,DATE_1,DATE_2,DATE_3;
)] 'Copy_0:DSLink4.v'
;


#################################################################
#### STAGE: Sequential_File_1
## Operator
import
## Operator options
-schema record
  {final_delim=end, record_delim_string='\r\n', delim=',', quote=double}
(
  CUSTOMER_NUMBER:string[max=10];
  DATE_1:date;
  DATE_2:date;
  DATE_3:date;
)
-file  'C:\\DS_DISK\\BI_CORSO\\TutorialIBMVin\\DataStage87TutorialFiles\\GlobalCo_BillTo_CONVDATE.txt'
-rejects continue
-reportProgress yes
## General options
[ident('Sequential_File_1'); jobmon_ident('Sequential_File_1')]
## Outputs
0> [] 'Sequential_File_1:DSLink3.v'
;


#################################################################
#### STAGE: Sequential_File_2
## Operator
export
## Operator options
-schema record
  {final_delim=end, record_delim_string='\r\n', delim=',', quote=double}
(
  CUSTOMER_NUMBER:string[max=10];
  DATE_1:date;
  DATE_2:date;
  DATE_3:date;
)
-file 'C:\\DS_DISK\\BI_CORSO\\TutorialIBMVin\\copy_stage.txt'
-overwrite
-rejects continue
## General options
[ident('Sequential_File_2'); jobmon_ident('Sequential_File_2')]
## Inputs
0< [] 'Copy_0:DSLink4.v'
;
# End of OSH code
and the Job SCORE DUMP generated at is the following:

Code: Select all

main_program: This step has 2 datasets:

ds0: {op0[1p] (sequential Sequential_File_1)
      eAny<>eCollectAny
      op1[2p] (parallel Copy_0)}

ds1: {op1[2p] (parallel Copy_0)
      >>eCollectAny
      op2[1p] (sequential APT_RealFileExportOperator in Sequential_File_2)}

It has 3 operators:

op0[1p] {(sequential Sequential_File_1)
    on nodes (
      node1[op0,p0]
    )}

op1[2p] {(parallel Copy_0)
    on nodes (
      node1[op1,p0]
      node2[op1,p1]
    )}

op2[1p] {(sequential APT_RealFileExportOperator in Sequential_File_2)
    on nodes (
      node2[op2,p0]
    )}

It runs 4 processes on 2 nodes.
Between OSH scripts in the cases 1) and 2) and between SCORE DUMPs in the cases 1) and 2) I do not perceive differences... they are the same.

Also in the "performace statistics" I do not perceive difference at all!

So, what are the difference if we set the FORCE property to TRUE (FORCE=TRUE) or FORCE=FALSE?

From the documentation it look like that if the Force property is set to TRUE (FORCE=TRUE) then datatage do no optimize out the
job removing the copy operator at compile time, instead if the Forse property is set to FALSE (FORCE=FALSE) then datastage can
decide if optimize or no the job removing the copy operator.


I have tried to re-compile and re-run really many times but the result is always the same, the copy operator is always present !

Posted: Tue Apr 23, 2013 2:07 pm
by chulett
So, as you are finding, having that property set to FALSE just means that the Copy Stage may be optimized out - not that it will be. Unfortunately, I can't tell you under what exact circumstances it will decide it's not really needed and remove it... perhaps others can. Or you could open a support case and see if you can find out what the official rules are for Copy Stage removal, or suggested best practices on when you would set FORCE=TRUE.

Posted: Tue Apr 23, 2013 2:20 pm
by ray.wurlod
Is the Copy stage doing anything, such as dropping or renaming columns?

Posted: Tue Apr 23, 2013 2:26 pm
by Enzopre
chulett wrote:So, as you are finding, having that property set to FALSE just means that the Copy Stage may be optimized out - not that it will be. .......
Now it's all more clear. This is what I have thought. Indeed, from the documentation:

[....] if the Forse property is set to FALSE (FORCE=FALSE) then datastage can decide if optimize or no the job removing the copy operator.
ray.wurlod wrote:Is the Copy stage doing anything, such as dropping or renaming columns?
Not at all! Simply the copy stage does the copy of data in the Sequential_File_1 to the Sequential_File_2. You can notice this from the OSH scripts!!

Posted: Tue Apr 23, 2013 3:45 pm
by chulett
Enzopre wrote:Indeed, from the documentation:
We know, you've now posted that paraphrased section of the documentation three times in the thread. :wink:

Posted: Wed Apr 24, 2013 3:02 am
by eph
Hi,

Try this example, which will show you how DS optimize or not the copy operation.

I suppose that you copy operator is not optimized since it is partitionning the data (only parallel stage in your job) and is the only active stage?

Check also this answer from Ray

Eric

Posted: Wed Apr 24, 2013 4:06 am
by Enzopre
eph wrote:[....]

I suppose that you copy operator is not optimized since it is partitionning the data (only parallel stage in your job) and is the only active stage?

[....]
Yes exactly! However thanks for the examples that you suggested me.

I'll let you know....