Page 1 of 1

Why to use Copy Stage for single input and output.

Posted: Wed Feb 06, 2013 1:07 pm
by 41ranjeet
Why to use copy stage if there is only single input and output?

Case1 :If we are using copy stage from seq file(src) to seq file(tgt) and doing nothing simply sending src records to tgt .

Case2 :If we are directly sending src records to tgt without using copy stage.

Which case has the better performance and why?

Thanks.

Posted: Wed Feb 06, 2013 2:29 pm
by mobashshar

Posted: Wed Feb 06, 2013 8:23 pm
by ray.wurlod
Welcome aboard. The optimized job score will remove a Copy stage that does a simple transfer, so the two jobs will have identical performance. You can prevent this removal via the Force option in the Copy stage itself, but it's a pretty efficient operator and you may still not see any performance difference.

Posted: Thu Feb 07, 2013 1:13 am
by zulfi123786
There was a time, I recollect the version 7.5 days where we observed something fascinating.

The job was reading from DB2 Enterprise stage and writing to Dataset whith no stage in between and the job intelligently created the dataset segment files in the resource path of DB2 node but since the dataset was supposed to be used in the down stream jobs it was taking long time to read the segment files and get the data across the network So we had to use a copy stage while creating the dataset to bypass the jobs intellect and get the segment files created on the datastage server inspite of the DB2 server.

Posted: Thu Feb 07, 2013 3:46 pm
by ray.wurlod
Not all optimizations turn out to be optimal. The same is especially true of operator combination, which is way too aggressively done for my liking.

Posted: Sat Feb 09, 2013 6:44 pm
by kduke
There are several good reasons to use a copy stage. It is easy to add a Peak stage if you are having issues. There was an ex-IBMer I worked with and he always had copy stages in places you thought were a waste of time. He always counted on the optimizer to remove them. Another reason was to change how it partitioned the data. So sometimes something that looks harmless to remove might be doing something more than you think.

If you have a job which is so complex you need to debug it all the time then just leave the copy stages in. Right before a lookup, join or merge is always a good place to debug a job giving questionable results.

Posted: Sun Feb 10, 2013 12:13 am
by zulfi123786
I have seen One of the IBM projects where they have made it mandatory to add copy stage after each stage, though it makes no difference at run time but maintanence becomes easy when you have to add more flows to the design

Posted: Mon Feb 11, 2013 1:22 pm
by 41ranjeet
ray.wurlod wrote:Welcome aboard. The optimized job score will remove a Copy stage that does a simple transfer, so the two jobs will have identical performance. You can prevent this removal via the Force option in th ...
Thanks Ray for warm welcome.