Hi,
We need to remove duplicate data based on one key cloumn and date value
eid,date
23,2011-03-21,james
12,2011-04-10,mat
13,1982,03-03,Karth
23,2011-04-21,Maek
12,2011-04-23,Ojem
13-2011-05-13,Kim
need to send create file eid with max date.
Output.
--------
23,2011-04-21,Maek
12,2011-04-23,Ojem
13-2011-05-13,Kim
Some one pelase help how we can get records data based on max date.
Thanks.
Need to Delete duplicated records
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 152
- Joined: Mon Mar 07, 2011 6:16 am
Re: Need to Delete duplicated records
Perform the following process:
1. Perform a sort on the basis of eid(ascending) then date(descending).
2. perform a remove duplicate on eid.
Hope it will works
1. Perform a sort on the basis of eid(ascending) then date(descending).
2. perform a remove duplicate on eid.
Hope it will works
Soumya
-
- Participant
- Posts: 152
- Joined: Mon Mar 07, 2011 6:16 am
-
- Participant
- Posts: 91
- Joined: Wed Apr 20, 2005 7:59 pm
- Location: U.S.
Re: Need to Delete duplicated records
I think source will be like this:
eid,date,name
23,2011-03-21,james
12,2011-04-10,mat
13,1982-03-03,Karth
23,2011-04-21,Maek
12,2011-04-23,Ojem
13,2011-05-13,Kim
in this case:
Seqfile---->Sort------->RemoveDup---->Dataset
sort on Eid (descending)
Remove duplicate on Eid, duplicate retain = last
this will work for required o/p
Ram..................
eid,date,name
23,2011-03-21,james
12,2011-04-10,mat
13,1982-03-03,Karth
23,2011-04-21,Maek
12,2011-04-23,Ojem
13,2011-05-13,Kim
in this case:
Seqfile---->Sort------->RemoveDup---->Dataset
sort on Eid (descending)
Remove duplicate on Eid, duplicate retain = last
this will work for required o/p
Ram..................
ANJI