remove duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vamsipx
Participant
Posts: 3
Joined: Wed Aug 22, 2007 11:31 pm

remove duplicates

Post by vamsipx »

hi all,

i am knew to this environment and i want to mknow how i can remove duplicates in transformer stage.
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

It can be done in transformer stage using stage variables. But is there any specific reason for you to do this in the transformer stage when you have a Remove Duplicates stage?

In both the transformer or remove duplicates cases you will have to hash partition on the fields on which you are removing duplicates and also sort the data on the same fields.
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

Re: remove duplicates

Post by DSRajesh »

vamsipx wrote:hi all,

i am knew to this environment and i want to mknow how i can remove duplicates in transformer stage.
You can eliminate duplicates using stage variables in transformer stage .
Write logic tochk equality of rows.

Let me know if any queries

Regards
Rajesh Devabhaktuni
RD
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

As Maveric says, Using a Remove Duplicates stage is the easiest.
You don't even have to change the partitioning(Set them to Auto which is the default and Datastage takes care of partitioning).

If there is a specific reason like capturing rejects or something else, you can always remove duplicates by using Sort(with create key change to true) and a transformer where you reject if the key change is 0.

IHTH
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
Post Reply