Copy Vs Transformer..

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bskumar4u
Participant
Posts: 13
Joined: Mon Feb 21, 2011 4:47 am
Location: Hyderabad

Copy Vs Transformer..

Post by bskumar4u »

i have a .ds file with 2.6billion records.
i just want to have a dump of this file.
i can use a copy stage but just my curiosity can we use a tranformer in place of copy..??
i know that transfomer is robust and compiles in C++ then orch..
but is the same process time for copy and transformer..??
as per my knowledge the performance degradation of a job will effect if that job has... say 6+ transformers...(Datastage v 8.1)

so for a basic copy function can we use transformer..??
....................Shanthi
arvind_ds
Participant
Posts: 428
Joined: Thu Aug 16, 2007 11:38 pm
Location: Manali

Post by arvind_ds »

Transformer stage is very costly as compared to copy stage.Either use copy stage to create a dump of .ds file or use orchadmin utility in server side to create a copy of your dataset or use dataset management option in datastage designer to create a copy of the dataset.
Arvind
bskumar4u
Participant
Posts: 13
Joined: Mon Feb 21, 2011 4:47 am
Location: Hyderabad

Post by bskumar4u »

could u plz be a little elaborative...
i mean transformer in 8.1 is far improved compared to older versions...
does this really effect on the performance if use transformer..??
cons and pros in using it..?
....................Shanthi
bskumar4u
Participant
Posts: 13
Joined: Mon Feb 21, 2011 4:47 am
Location: Hyderabad

Post by bskumar4u »

and regarding using dataset management tool...we may copy that .ds but it wont copy the reference nodes through which data is loaded into .ds (that means it may not copy the appropriate data)

correct my understanding if im wrong..!!
....................Shanthi
arvind_ds
Participant
Posts: 428
Joined: Thu Aug 16, 2007 11:38 pm
Location: Manali

Post by arvind_ds »

copy stage is a light weight stage and not transformer stage even if transformer stage has improved features in 8.1

Why not to use a copy stage to copy the dataset?

Use the below job design.

data set stage --> copy stage --> dataset stage.
Arvind
bskumar4u
Participant
Posts: 13
Joined: Mon Feb 21, 2011 4:47 am
Location: Hyderabad

Post by bskumar4u »

ya..the simple design can easily be done..
but my questions are unanswered..
is it the same process time for both..?
how are 'light weight/ heavy' terms justified..??
what damage does it do when we use transformer instead of copy..?
....................Shanthi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The only way to answer those questions would be to try both ways on your system with your data. Then you'll know.
-craig

"You can never have too many knives" -- Logan Nine Fingers
bskumar4u
Participant
Posts: 13
Joined: Mon Feb 21, 2011 4:47 am
Location: Hyderabad

Post by bskumar4u »

I ve tried both the ways..
and had to abort them...reasons being...
both the cases...the processing started around 120-140k rows/sec and stable around 60k rows/sec.
at this speed loading 2.6billion...????
my observation was the diff in comipiling time...
processing time both took almost the same rows/sec.
So, wanted to know ...

also is there any other method to have a copy of .ds..???
....................Shanthi
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Compile time will be different, longer with the transformer initially because of the need to create the C++ source and then compile it, where with the copy stage it merely adds the stage to the OSH script to be executed at runtime.

The simplest job for copying a ds to another ds? DataSet-->DataSet. No need for the copy stage in the job design. You can use job monitor to see the progress instead of performance statistics, or better yet use the performance analysis feature.

At this point, you are probably limited by:
1) The maximum throughput of your hardware (storage, network, server)
2) Transport buffer sizes within the parallel engine (tunable)

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I would imagine the copy stage would be optimized out of the runtime job unless you forced it to be kept... and as noted, there's no need for that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Copy Vs Transformer..

Post by ray.wurlod »

bskumar4u wrote:i have a .ds file with 2.6billion records.
No you don't. The .ds file contains no records whatsoever. It's a descriptor file that describes locations of physical files that do contain your data.

Another way to make a copy of a data set is to use the orchadmin cp command. This does not generate any rate statistics (e.g. rows/sec) but you can still measure its elapsed time.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bskumar4u
Participant
Posts: 13
Joined: Mon Feb 21, 2011 4:47 am
Location: Hyderabad

Post by bskumar4u »

Agreed..!! but.....fine leave abt that...
....................Shanthi
Post Reply