Hi All,
I need to delete all occurences of duplicates from Sequential File as Input...
Input.txt
RNO|NAME|GRP|AGE
1|S|10|52
2|X|10|52
3|Y|20|52
1|Z|10|52
4|A|30|52
My Desired Output should be:
RNO|NAME|GRP|AGE
2|X|10|52
3|Y|20|52
4|A|30|52
I acheived using Shell Script & Loading into DB & by using having count(1)> 1..
still is there any better we can achieve this using any of the parallel stages..
Thanks in Advance
Regards,
Sekhar
How to delete all occurences of duplicates from seq file
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 342
- Joined: Tue Nov 04, 2008 10:38 am
- Location: Chennai, India
-
- Participant
- Posts: 63
- Joined: Mon Oct 20, 2008 12:01 am
- Location: Malaysia
Re: How to delete all occurences of duplicates from seq file
I assume that you take the highest RNO number as you survival record if duplication occur in GPR and AGE.
In this case, I will sort the records by GPR, AGE then follow by RNO. After that, use the remove duplicate stage with GPR and AGE as key and retain first/last (depends on you sort the RNO asc or desc).
In this case, I will sort the records by GPR, AGE then follow by RNO. After that, use the remove duplicate stage with GPR and AGE as key and retain first/last (depends on you sort the RNO asc or desc).
Re: How to delete all occurences of duplicates from seq file
I want to delete all occurences....
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
This is a classic fork join design as others have indicated. Downstream of the Join stage use a Filter stage or Transformer stage to allow past only those records that have a count of 1.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.