Remove duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mallikharjuna
Participant
Posts: 81
Joined: Thu Nov 30, 2006 7:46 am
Location: india

Remove duplicates

Post by mallikharjuna »

Hi All,

My source is teradata table,i have to remove the duplicates based on one column.

I know few ways to remove the duplicates

1)User defined source Query
2)Remove duplicate stage
3)By using sort and transformer stage.

in above three options which one is the best one if the volumes of the data are high.I Know only above 3 options, please suggest me if there is any other option which will give the best performace?

Thanks in advance
Mallikharjuna Reddy
MALLI
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If the load on the database server is light then use DISTINCT in your extraction query.

If the load on the DataStage server is light and you don't need to specify first or last from each group, use a Sort stage (you can control allocation of memory using the stage rather than a link sort).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply