Why to use copy stage if there is only single input and output?
Case1 :If we are using copy stage from seq file(src) to seq file(tgt) and doing nothing simply sending src records to tgt .
Case2 :If we are directly sending src records to tgt without using copy stage.
Which case has the better performance and why?
Thanks.
Why to use Copy Stage for single input and output.
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 91
- Joined: Wed Apr 20, 2005 7:59 pm
- Location: U.S.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Welcome aboard. The optimized job score will remove a Copy stage that does a simple transfer, so the two jobs will have identical performance. You can prevent this removal via the Force option in the Copy stage itself, but it's a pretty efficient operator and you may still not see any performance difference.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
There was a time, I recollect the version 7.5 days where we observed something fascinating.
The job was reading from DB2 Enterprise stage and writing to Dataset whith no stage in between and the job intelligently created the dataset segment files in the resource path of DB2 node but since the dataset was supposed to be used in the down stream jobs it was taking long time to read the segment files and get the data across the network So we had to use a copy stage while creating the dataset to bypass the jobs intellect and get the segment files created on the datastage server inspite of the DB2 server.
The job was reading from DB2 Enterprise stage and writing to Dataset whith no stage in between and the job intelligently created the dataset segment files in the resource path of DB2 node but since the dataset was supposed to be used in the down stream jobs it was taking long time to read the segment files and get the data across the network So we had to use a copy stage while creating the dataset to bypass the jobs intellect and get the segment files created on the datastage server inspite of the DB2 server.
- Zulfi
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
There are several good reasons to use a copy stage. It is easy to add a Peak stage if you are having issues. There was an ex-IBMer I worked with and he always had copy stages in places you thought were a waste of time. He always counted on the optimizer to remove them. Another reason was to change how it partitioned the data. So sometimes something that looks harmless to remove might be doing something more than you think.
If you have a job which is so complex you need to debug it all the time then just leave the copy stages in. Right before a lookup, join or merge is always a good place to debug a job giving questionable results.
If you have a job which is so complex you need to debug it all the time then just leave the copy stages in. Right before a lookup, join or merge is always a good place to debug a job giving questionable results.
Mamu Kim
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore