Need to Delete duplicated records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rajudx
Participant
Posts: 45
Joined: Tue Nov 14, 2006 1:58 pm
Location: NJ

Need to Delete duplicated records

Post by rajudx »

Hi,

We need to remove duplicate data based on one key cloumn and date value

eid,date
23,2011-03-21,james
12,2011-04-10,mat
13,1982,03-03,Karth
23,2011-04-21,Maek
12,2011-04-23,Ojem
13-2011-05-13,Kim

need to send create file eid with max date.

Output.
--------
23,2011-04-21,Maek
12,2011-04-23,Ojem
13-2011-05-13,Kim

Some one pelase help how we can get records data based on max date.

Thanks.
Ran
soumya5891
Participant
Posts: 152
Joined: Mon Mar 07, 2011 6:16 am

Re: Need to Delete duplicated records

Post by soumya5891 »

Perform the following process:
1. Perform a sort on the basis of eid(ascending) then date(descending).
2. perform a remove duplicate on eid.

Hope it will works
Soumya
rajudx
Participant
Posts: 45
Joined: Tue Nov 14, 2006 1:58 pm
Location: NJ

Post by rajudx »

No.it's not working and duplicate records are not removing and this approach is not working.
Ran
soumya5891
Participant
Posts: 152
Joined: Mon Mar 07, 2011 6:16 am

Post by soumya5891 »

Did u make the partition properly
Soumya
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Is that the complete data you are working with? Three columns? If yes then group by on eid and name and take the max date.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

Do this:
1.Use Remove Duplicate Stage.
2.Sort the Input Field in Remove Duplicate stage on eid and date with asc and make sure you use sort and partition on eid and only sort on date input field.
3. Keep the Last Row.

You will get the desired result.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Are you looking for a Server or a PX solution? You've posted in the PX forum but marked your post as Server, hence the question.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ds_dwh
Participant
Posts: 39
Joined: Fri May 14, 2010 6:06 am

Re: Need to Delete duplicated records

Post by ds_dwh »

I think source will be like this:

eid,date,name
23,2011-03-21,james
12,2011-04-10,mat
13,1982-03-03,Karth
23,2011-04-21,Maek
12,2011-04-23,Ojem
13,2011-05-13,Kim

in this case:
Seqfile---->Sort------->RemoveDup---->Dataset

sort on Eid (descending)
Remove duplicate on Eid, duplicate retain = last

this will work for required o/p


Ram..................
ANJI
Post Reply