Why to use Copy Stage for single input and output.

41ranjeet · Post by **41ranjeet** » Wed Feb 06, 2013 1:07 pm

Why to use copy stage if there is only single input and output?

Case1 :If we are using copy stage from seq file(src) to seq file(tgt) and doing nothing simply sending src records to tgt .

Case2 :If we are directly sending src records to tgt without using copy stage.

Which case has the better performance and why?

Thanks.

mobashshar · Post by **mobashshar** » Wed Feb 06, 2013 2:29 pm

Please see this viewtopic.php?p=212053&sid=c442a53e4274 ... 2f93ae0167

and this
http://publib.boulder.ibm.com/infocente ... Stage.html

ray.wurlod · Post by **ray.wurlod** » Wed Feb 06, 2013 8:23 pm

Welcome aboard. The optimized job score will remove a Copy stage that does a simple transfer, so the two jobs will have identical performance. You can prevent this removal via the Force option in the Copy stage itself, but it's a pretty efficient operator and you may still not see any performance difference.

zulfi123786 · Post by **zulfi123786** » Thu Feb 07, 2013 1:13 am

There was a time, I recollect the version 7.5 days where we observed something fascinating.

The job was reading from DB2 Enterprise stage and writing to Dataset whith no stage in between and the job intelligently created the dataset segment files in the resource path of DB2 node but since the dataset was supposed to be used in the down stream jobs it was taking long time to read the segment files and get the data across the network So we had to use a copy stage while creating the dataset to bypass the jobs intellect and get the segment files created on the datastage server inspite of the DB2 server.

ray.wurlod · Post by **ray.wurlod** » Thu Feb 07, 2013 3:46 pm

Not all optimizations turn out to be optimal. The same is especially true of operator combination, which is way too aggressively done for my liking.

kduke · Post by **kduke** » Sat Feb 09, 2013 6:44 pm

There are several good reasons to use a copy stage. It is easy to add a Peak stage if you are having issues. There was an ex-IBMer I worked with and he always had copy stages in places you thought were a waste of time. He always counted on the optimizer to remove them. Another reason was to change how it partitioned the data. So sometimes something that looks harmless to remove might be doing something more than you think.

If you have a job which is so complex you need to debug it all the time then just leave the copy stages in. Right before a lookup, join or merge is always a good place to debug a job giving questionable results.

zulfi123786 · Post by **zulfi123786** » Sun Feb 10, 2013 12:13 am

I have seen One of the IBM projects where they have made it mandatory to add copy stage after each stage, though it makes no difference at run time but maintanence becomes easy when you have to add more flows to the design

41ranjeet · Post by **41ranjeet** » Mon Feb 11, 2013 1:22 pm

ray.wurlod wrote:Welcome aboard. The optimized job score will remove a Copy stage that does a simple transfer, so the two jobs will have identical performance. You can prevent this removal via the Force option in th ...

Thanks Ray for warm welcome.