Understanding of Remove Duplicate Stage Execution

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ramsubbiah
Participant
Posts: 40
Joined: Tue Nov 11, 2008 5:49 am

Understanding of Remove Duplicate Stage Execution

Post by ramsubbiah »

Hi All,

I need a clarification with respect to remove duplicate stage.


Scenario 1:
Job Design:

Source Dataset ---> Sort stage ----> peek stage

As we all know, using sort stage we can remove the duplicates. In this case when I checked $APT_DUMP_SCORE of my job I could not able to see separate operator(Remdup Operator).so can I assume tsort operator is performing both sorting & remove duplicate operation? or internally Remdup operator is assigned to remove duplicates?

Scenario 2:
Job Design:

Source Dataset ---> Sort stage ---->Remove Duplicate Stage----> peek stage

In this case i could able to see separate operator has been assigned to sort stage as well for remove duplicate stage.

which approach is better in terms of performance? Thanks in advance.

Thanks,
Ram
Knowledge is Fair,execution is matter!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

They're pretty close to identical in terms of performance. What's different is the functionality - with the Remove Duplicates stage you can specify which record to keep from each group (first or last) whereas with a unique sort you cannot specify which record to keep from each group.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ramsubbiah
Participant
Posts: 40
Joined: Tue Nov 11, 2008 5:49 am

Post by ramsubbiah »

Hi Ray,
Thanks for your Response
Since I don't have premium membership, I am not able to see your complete response. anyway I will upgrade my membership and let you the know the outcome.

Thanks,
Ram
Knowledge is Fair,execution is matter!
Post Reply