Performance: Remove Duplicates or Transformer?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ag_ram
Premium Member
Premium Member
Posts: 524
Joined: Wed Feb 28, 2007 3:51 am

Performance: Remove Duplicates or Transformer?

Post by ag_ram »

All,
Fact: Remove Duplicate Stage functionality can be implemented in a Transformer Stage.
Does this fact quarantee for improvements on execution time consumption or Bulk records handling or any specific options.
Please suggest me to have profound view on this.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Fact: Remove duplicates can be performed in any stage that has an input link.

The answer to your question may well be dependent on the data type of the keys and the size of the records. I would suggest experimentation to determine whether there is any difference at all. There is a startup (calling) overhead for a Transformer stage but, for a sufficiently large volume of data, this may be considered to be negligible.

All remove duplicates methods required sorted data, so that cost can be factored out of the equation.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply