Remove duplicates

mallikharjuna · Post by **mallikharjuna** » Sat Mar 27, 2010 1:35 pm

Hi All,

My source is teradata table,i have to remove the duplicates based on one column.

I know few ways to remove the duplicates

1)User defined source Query
2)Remove duplicate stage
3)By using sort and transformer stage.

in above three options which one is the best one if the volumes of the data are high.I Know only above 3 options, please suggest me if there is any other option which will give the best performace?

Thanks in advance
Mallikharjuna Reddy

ray.wurlod · Post by **ray.wurlod** » Sat Mar 27, 2010 1:47 pm

If the load on the database server is light then use DISTINCT in your extraction query.

If the load on the DataStage server is light and you don't need to specify first or last from each group, use a Sort stage (you can control allocation of memory using the stage rather than a link sort).