Hi All,
My source is teradata table,i have to remove the duplicates based on one column.
I know few ways to remove the duplicates
1)User defined source Query
2)Remove duplicate stage
3)By using sort and transformer stage.
in above three options which one is the best one if the volumes of the data are high.I Know only above 3 options, please suggest me if there is any other option which will give the best performace?
Thanks in advance
Mallikharjuna Reddy
Remove duplicates
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 81
- Joined: Thu Nov 30, 2006 7:46 am
- Location: india
Remove duplicates
MALLI
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If the load on the database server is light then use DISTINCT in your extraction query.
If the load on the DataStage server is light and you don't need to specify first or last from each group, use a Sort stage (you can control allocation of memory using the stage rather than a link sort).
If the load on the DataStage server is light and you don't need to specify first or last from each group, use a Sort stage (you can control allocation of memory using the stage rather than a link sort).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.