Page 1 of 1

Remove duplicates

Posted: Sat Mar 27, 2010 1:35 pm
by mallikharjuna
Hi All,

My source is teradata table,i have to remove the duplicates based on one column.

I know few ways to remove the duplicates

1)User defined source Query
2)Remove duplicate stage
3)By using sort and transformer stage.

in above three options which one is the best one if the volumes of the data are high.I Know only above 3 options, please suggest me if there is any other option which will give the best performace?

Thanks in advance
Mallikharjuna Reddy

Posted: Sat Mar 27, 2010 1:47 pm
by ray.wurlod
If the load on the database server is light then use DISTINCT in your extraction query.

If the load on the DataStage server is light and you don't need to specify first or last from each group, use a Sort stage (you can control allocation of memory using the stage rather than a link sort).