performance variance

pandeesh · Post by **pandeesh** » Tue May 24, 2011 9:00 pm

Hi,

I want to know the difference between the below designs:

The job design is simple, as extracting from a table and loading into datset

The source table contains 4 million records and we are using 2 node configuration.

1) Oracle stage---->Transformer------>Dataset

2)Oraclestage------->Copy-------->Dataset

3)Oracle stage------->Dataset.

Among those three designs which one will be effective one?

How will be the performance?

Thanks

SURA · Post by **SURA** » Tue May 24, 2011 10:17 pm

As far as i know, you wont find much difference in result, depend the data volume.

Hence you are writing it in a file, i guess you wont find much.

In the coming days, if you want to do something in the data, that time TFM will help.

DS User

ray.wurlod · Post by **ray.wurlod** » Wed May 25, 2011 12:17 am

(2) and (3) are identical, assuming the Force option is not used in the Copy stage. Adding a Transformer stage, even one that transfers data only, will add a small demand for resources.

pandeesh · Post by **pandeesh** » Wed May 25, 2011 12:50 am

What will be the differenec between 2 and 3 , if force is enabled in copy stage?

Will there be any difference in runtime>

thanks

ray.wurlod · Post by **ray.wurlod** » Wed May 25, 2011 2:25 am

Maybe, maybe not. For 0 rows, definitely not. Times are only reported in whole seconds, so there may be no measurable difference for a moderate number of rows either. How many will depend upon how wide the rows are; you did not offer that information.

chandra.shekhar@tcs.com · Wed May 25, 2011 11:40 pm

Tfr is a heavy processing stage when speaking of 4 million records.
Tfr will take more time(differenct can be in seconds also) than (2) and (3).
And as everybody (2) and (3) are equal, I think use (3) option.

pandeesh · Post by **pandeesh** » Wed May 25, 2011 11:59 pm

So, what's the importance of copy stage?
where it plays a vital role?

Thanks

ray.wurlod · Post by **ray.wurlod** » Thu May 26, 2011 12:03 am

It's the cheapest stage for renaming columns, dropping columns, re-ordering columns on the link and executing implicit data type conversions.

It's particularly useful for making copies of its input when you need more than one copy.

SURA · Post by **SURA** » Thu May 26, 2011 12:09 am

You can
take more than one copy of the input data.
shuffle the metadata order
Rename the column
Drop metadata etc.

All depends what you need to do? where you need to use!

Example Scenario: Input date will pass into AGGR stage, as well as to JOIN stage from a COPY stage and then do inner join to combine data....

DS User

DSXchange

performance variance

performance variance

Re: performance variance