How to use Copy stage Force property
Moderators: chulett, rschirm, roy
How to use Copy stage Force property
hi at all
I have a question for you about copy stage.
From the documentation it look like that if the Force property is set to TRUE (FORCE=TRUE) then datatage do no optimize out the
job removing the copy operator at compile time, instead if the Forse property is set to FALSE (FORCE=FALSE) then datastage can
decide if optimize or no the job removing the copy operator.
Now, what is not clears is when do we set FORCE=TRUE OR FORCE=FALSE? and Why? More precisely, what happeans and why if:
1) we have One input link, one output link AND FORCE=TRUE
2) we have one input link, one output link AND FORCE=FALSE
3) we have one input link, more output link AND FORCE=TRUE
4) we have one input link, more output link AND FORCE=FALSE
regards.
I have a question for you about copy stage.
From the documentation it look like that if the Force property is set to TRUE (FORCE=TRUE) then datatage do no optimize out the
job removing the copy operator at compile time, instead if the Forse property is set to FALSE (FORCE=FALSE) then datastage can
decide if optimize or no the job removing the copy operator.
Now, what is not clears is when do we set FORCE=TRUE OR FORCE=FALSE? and Why? More precisely, what happeans and why if:
1) we have One input link, one output link AND FORCE=TRUE
2) we have one input link, one output link AND FORCE=FALSE
3) we have one input link, more output link AND FORCE=TRUE
4) we have one input link, more output link AND FORCE=FALSE
regards.
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
You have to look at job score to see the difference. If you have multiple output from the copy stage, You may not find any difference.
Also the optimizer will not change your job design, but the way it executes hence the job score is where you should look for.
Also the optimizer will not change your job design, but the way it executes hence the job score is where you should look for.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Just looking at the row counts ("performance statistics") in Designer will give you a clue. If there are row counts both sides of the Copy stage, then it's in the design and in the runtime.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
So, I have tried to re-compile and re-run in the two following cases:
1) One input link, one output link AND FORCE=FALSE
in this case the OSH script generated at compile time is the following:
and the Job SCORE DUMP generated at run time is the following:
2) one link unput, one link output AND FORCE=TRUE
in this case the OSH script generated at compile time is the following:
and the Job SCORE DUMP generated at is the following:
Between OSH scripts in the cases 1) and 2) and between SCORE DUMPs in the cases 1) and 2) I do not perceive differences... they are the same.
Also in the "performace statistics" I do not perceive difference at all!
So, what are the difference if we set the FORCE property to TRUE (FORCE=TRUE) or FORCE=FALSE?
From the documentation it look like that if the Force property is set to TRUE (FORCE=TRUE) then datatage do no optimize out the
job removing the copy operator at compile time, instead if the Forse property is set to FALSE (FORCE=FALSE) then datastage can
decide if optimize or no the job removing the copy operator.
I have tried to re-compile and re-run really many times but the result is always the same, the copy operator is always present !
1) One input link, one output link AND FORCE=FALSE
in this case the OSH script generated at compile time is the following:
Code: Select all
# OSH / orchestrate script for Job Copy_Stage compiled at 09.54.30 23 apr 2013
#################################################################
#### STAGE: Copy_0
## Operator
copy
## General options
[ident('Copy_0'); jobmon_ident('Copy_0')]
## Inputs
0< [] 'Sequential_File_1:DSLink3.v'
## Outputs
0> [modify (
keep
CUSTOMER_NUMBER,DATE_1,DATE_2,DATE_3;
)] 'Copy_0:DSLink4.v'
;
#################################################################
#### STAGE: Sequential_File_1
## Operator
import
## Operator options
-schema record
{final_delim=end, record_delim_string='\r\n', delim=',', quote=double}
(
CUSTOMER_NUMBER:string[max=10];
DATE_1:date;
DATE_2:date;
DATE_3:date;
)
-file 'C:\\DS_DISK\\BI_CORSO\\TutorialIBMVin\\DataStage87TutorialFiles\\GlobalCo_BillTo_CONVDATE.txt'
-rejects continue
-reportProgress yes
## General options
[ident('Sequential_File_1'); jobmon_ident('Sequential_File_1')]
## Outputs
0> [] 'Sequential_File_1:DSLink3.v'
;
#################################################################
#### STAGE: Sequential_File_2
## Operator
export
## Operator options
-schema record
{final_delim=end, record_delim_string='\r\n', delim=',', quote=double}
(
CUSTOMER_NUMBER:string[max=10];
DATE_1:date;
DATE_2:date;
DATE_3:date;
)
-file 'C:\\DS_DISK\\BI_CORSO\\TutorialIBMVin\\copy_stage.txt'
-overwrite
-rejects continue
## General options
[ident('Sequential_File_2'); jobmon_ident('Sequential_File_2')]
## Inputs
0< [] 'Copy_0:DSLink4.v'
;
# End of OSH code
Code: Select all
main_program: This step has 2 datasets:
ds0: {op0[1p] (sequential Sequential_File_1)
eAny<>eCollectAny
op1[2p] (parallel Copy_0)}
ds1: {op1[2p] (parallel Copy_0)
>>eCollectAny
op2[1p] (sequential APT_RealFileExportOperator in Sequential_File_2)}
It has 3 operators:
op0[1p] {(sequential Sequential_File_1)
on nodes (
node1[op0,p0]
)}
op1[2p] {(parallel Copy_0)
on nodes (
node1[op1,p0]
node2[op1,p1]
)}
op2[1p] {(sequential APT_RealFileExportOperator in Sequential_File_2)
on nodes (
node2[op2,p0]
)}
It runs 4 processes on 2 nodes.
in this case the OSH script generated at compile time is the following:
Code: Select all
# OSH / orchestrate script for Job Copy_Stage compiled at 10.00.35 23 apr 2013
#################################################################
#### STAGE: Copy_0
## Operator
copy
## Operator options
-force
## General options
[ident('Copy_0'); jobmon_ident('Copy_0')]
## Inputs
0< [] 'Sequential_File_1:DSLink3.v'
## Outputs
0> [modify (
keep
CUSTOMER_NUMBER,DATE_1,DATE_2,DATE_3;
)] 'Copy_0:DSLink4.v'
;
#################################################################
#### STAGE: Sequential_File_1
## Operator
import
## Operator options
-schema record
{final_delim=end, record_delim_string='\r\n', delim=',', quote=double}
(
CUSTOMER_NUMBER:string[max=10];
DATE_1:date;
DATE_2:date;
DATE_3:date;
)
-file 'C:\\DS_DISK\\BI_CORSO\\TutorialIBMVin\\DataStage87TutorialFiles\\GlobalCo_BillTo_CONVDATE.txt'
-rejects continue
-reportProgress yes
## General options
[ident('Sequential_File_1'); jobmon_ident('Sequential_File_1')]
## Outputs
0> [] 'Sequential_File_1:DSLink3.v'
;
#################################################################
#### STAGE: Sequential_File_2
## Operator
export
## Operator options
-schema record
{final_delim=end, record_delim_string='\r\n', delim=',', quote=double}
(
CUSTOMER_NUMBER:string[max=10];
DATE_1:date;
DATE_2:date;
DATE_3:date;
)
-file 'C:\\DS_DISK\\BI_CORSO\\TutorialIBMVin\\copy_stage.txt'
-overwrite
-rejects continue
## General options
[ident('Sequential_File_2'); jobmon_ident('Sequential_File_2')]
## Inputs
0< [] 'Copy_0:DSLink4.v'
;
# End of OSH code
Code: Select all
main_program: This step has 2 datasets:
ds0: {op0[1p] (sequential Sequential_File_1)
eAny<>eCollectAny
op1[2p] (parallel Copy_0)}
ds1: {op1[2p] (parallel Copy_0)
>>eCollectAny
op2[1p] (sequential APT_RealFileExportOperator in Sequential_File_2)}
It has 3 operators:
op0[1p] {(sequential Sequential_File_1)
on nodes (
node1[op0,p0]
)}
op1[2p] {(parallel Copy_0)
on nodes (
node1[op1,p0]
node2[op1,p1]
)}
op2[1p] {(sequential APT_RealFileExportOperator in Sequential_File_2)
on nodes (
node2[op2,p0]
)}
It runs 4 processes on 2 nodes.
Also in the "performace statistics" I do not perceive difference at all!
So, what are the difference if we set the FORCE property to TRUE (FORCE=TRUE) or FORCE=FALSE?
From the documentation it look like that if the Force property is set to TRUE (FORCE=TRUE) then datatage do no optimize out the
job removing the copy operator at compile time, instead if the Forse property is set to FALSE (FORCE=FALSE) then datastage can
decide if optimize or no the job removing the copy operator.
I have tried to re-compile and re-run really many times but the result is always the same, the copy operator is always present !
Last edited by Enzopre on Tue Apr 23, 2013 2:07 pm, edited 1 time in total.
So, as you are finding, having that property set to FALSE just means that the Copy Stage may be optimized out - not that it will be. Unfortunately, I can't tell you under what exact circumstances it will decide it's not really needed and remove it... perhaps others can. Or you could open a support case and see if you can find out what the official rules are for Copy Stage removal, or suggested best practices on when you would set FORCE=TRUE.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Now it's all more clear. This is what I have thought. Indeed, from the documentation:chulett wrote:So, as you are finding, having that property set to FALSE just means that the Copy Stage may be optimized out - not that it will be. .......
[....] if the Forse property is set to FALSE (FORCE=FALSE) then datastage can decide if optimize or no the job removing the copy operator.
Not at all! Simply the copy stage does the copy of data in the Sequential_File_1 to the Sequential_File_2. You can notice this from the OSH scripts!!ray.wurlod wrote:Is the Copy stage doing anything, such as dropping or renaming columns?
Last edited by Enzopre on Tue Apr 23, 2013 3:54 pm, edited 1 time in total.
Hi,
Try this example, which will show you how DS optimize or not the copy operation.
I suppose that you copy operator is not optimized since it is partitionning the data (only parallel stage in your job) and is the only active stage?
Check also this answer from Ray
Eric
Try this example, which will show you how DS optimize or not the copy operation.
I suppose that you copy operator is not optimized since it is partitionning the data (only parallel stage in your job) and is the only active stage?
Check also this answer from Ray
Eric