Why to use Copy Stage for single input and output.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
41ranjeet
Participant
Posts: 8
Joined: Fri Jan 15, 2010 10:43 pm

Why to use Copy Stage for single input and output.

Post by 41ranjeet »

Why to use copy stage if there is only single input and output?

Case1 :If we are using copy stage from seq file(src) to seq file(tgt) and doing nothing simply sending src records to tgt .

Case2 :If we are directly sending src records to tgt without using copy stage.

Which case has the better performance and why?

Thanks.
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard. The optimized job score will remove a Copy stage that does a simple transfer, so the two jobs will have identical performance. You can prevent this removal via the Force option in the Copy stage itself, but it's a pretty efficient operator and you may still not see any performance difference.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

There was a time, I recollect the version 7.5 days where we observed something fascinating.

The job was reading from DB2 Enterprise stage and writing to Dataset whith no stage in between and the job intelligently created the dataset segment files in the resource path of DB2 node but since the dataset was supposed to be used in the down stream jobs it was taking long time to read the segment files and get the data across the network So we had to use a copy stage while creating the dataset to bypass the jobs intellect and get the segment files created on the datastage server inspite of the DB2 server.
- Zulfi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not all optimizations turn out to be optimal. The same is especially true of operator combination, which is way too aggressively done for my liking.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

There are several good reasons to use a copy stage. It is easy to add a Peak stage if you are having issues. There was an ex-IBMer I worked with and he always had copy stages in places you thought were a waste of time. He always counted on the optimizer to remove them. Another reason was to change how it partitioned the data. So sometimes something that looks harmless to remove might be doing something more than you think.

If you have a job which is so complex you need to debug it all the time then just leave the copy stages in. Right before a lookup, join or merge is always a good place to debug a job giving questionable results.
Mamu Kim
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

I have seen One of the IBM projects where they have made it mandatory to add copy stage after each stage, though it makes no difference at run time but maintanence becomes easy when you have to add more flows to the design
- Zulfi
41ranjeet
Participant
Posts: 8
Joined: Fri Jan 15, 2010 10:43 pm

Post by 41ranjeet »

ray.wurlod wrote:Welcome aboard. The optimized job score will remove a Copy stage that does a simple transfer, so the two jobs will have identical performance. You can prevent this removal via the Force option in th ...
Thanks Ray for warm welcome.
Post Reply