remove duplicates

vamsipx · Post by **vamsipx** » Thu Oct 18, 2007 3:00 am

hi all,

i am knew to this environment and i want to mknow how i can remove duplicates in transformer stage.

Maveric · Post by **Maveric** » Thu Oct 18, 2007 3:47 am

It can be done in transformer stage using stage variables. But is there any specific reason for you to do this in the transformer stage when you have a Remove Duplicates stage?

In both the transformer or remove duplicates cases you will have to hash partition on the fields on which you are removing duplicates and also sort the data on the same fields.

DSRajesh · Post by **DSRajesh** » Sat Nov 24, 2007 6:40 am

vamsipx wrote:hi all,

i am knew to this environment and i want to mknow how i can remove duplicates in transformer stage.

You can eliminate duplicates using stage variables in transformer stage .
Write logic tochk equality of rows.

Let me know if any queries

Regards
Rajesh Devabhaktuni

Minhajuddin · Post by **Minhajuddin** » Sat Nov 24, 2007 8:12 am

As Maveric says, Using a Remove Duplicates stage is the easiest.
You don't even have to change the partitioning(Set them to Auto which is the default and Datastage takes care of partitioning).

If there is a specific reason like capturing rejects or something else, you can always remove duplicates by using Sort(with create key change to true) and a transformer where you reject if the key change is 0.

IHTH

DSXchange

remove duplicates

remove duplicates

Re: remove duplicates