performance variance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

performance variance

Post by pandeesh »

Hi,

I want to know the difference between the below designs:

The job design is simple, as extracting from a table and loading into datset

The source table contains 4 million records and we are using 2 node configuration.

1) Oracle stage---->Transformer------>Dataset

2)Oraclestage------->Copy-------->Dataset

3)Oracle stage------->Dataset.


Among those three designs which one will be effective one?

How will be the performance?

Thanks
pandeeswaran
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: performance variance

Post by SURA »

As far as i know, you wont find much difference in result, depend the data volume.

Hence you are writing it in a file, i guess you wont find much.

In the coming days, if you want to do something in the data, that time TFM will help.

DS User
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

(2) and (3) are identical, assuming the Force option is not used in the Copy stage. Adding a Transformer stage, even one that transfers data only, will add a small demand for resources.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

What will be the differenec between 2 and 3 , if force is enabled in copy stage?

Will there be any difference in runtime>

thanks
pandeeswaran
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Maybe, maybe not. For 0 rows, definitely not. Times are only reported in whole seconds, so there may be no measurable difference for a moderate number of rows either. How many will depend upon how wide the rows are; you did not offer that information.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Tfr is a heavy processing stage when speaking of 4 million records.
Tfr will take more time(differenct can be in seconds also) than (2) and (3).
And as everybody (2) and (3) are equal, I think use (3) option.
Thanx and Regards,
ETL User
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

So, what's the importance of copy stage?
where it plays a vital role?

Thanks
pandeeswaran
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's the cheapest stage for renaming columns, dropping columns, re-ordering columns on the link and executing implicit data type conversions.

It's particularly useful for making copies of its input when you need more than one copy.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

You can
take more than one copy of the input data.
shuffle the metadata order
Rename the column
Drop metadata etc.

All depends what you need to do? where you need to use!

Example Scenario: Input date will pass into AGGR stage, as well as to JOIN stage from a COPY stage and then do inner join to combine data....

DS User
Post Reply