Performance: Remove Duplicates or Transformer?

ag_ram · Post by **ag_ram** » Sat Jan 26, 2008 4:45 pm

All,
Fact: Remove Duplicate Stage functionality can be implemented in a Transformer Stage.
Does this fact quarantee for improvements on execution time consumption or Bulk records handling or any specific options.
Please suggest me to have profound view on this.

ray.wurlod · Post by **ray.wurlod** » Sat Jan 26, 2008 5:07 pm

Fact: Remove duplicates can be performed in any stage that has an input link.

The answer to your question may well be dependent on the data type of the keys and the size of the records. I would suggest experimentation to determine whether there is any difference at all. There is a startup (calling) overhead for a Transformer stage but, for a sufficiently large volume of data, this may be considered to be negligible.

All remove duplicates methods required sorted data, so that cost can be factored out of the equation.