Hi,
could you please let us know how we can remove duplicates in Datastage jobs and commond in unix which does this job.
Thanks
Nraj
How to Remove Duplicates from Flatfile?
Moderators: chulett, rschirm, roy
There could be many solutions:
one would be, Read the whole row of the Sequential file as a column, do a sort, use a transformer with a stage variable having previous row and write the next coming row only if it is not equal to previous row to another sequential file.
Copy the new file with overwrite option to the source file through after job subroutine.
one would be, Read the whole row of the Sequential file as a column, do a sort, use a transformer with a stage variable having previous row and write the next coming row only if it is not equal to previous row to another sequential file.
Copy the new file with overwrite option to the source file through after job subroutine.
Last edited by loveojha2 on Thu Dec 01, 2005 2:23 am, edited 1 time in total.
Success consists of getting up just one more time than you fall.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You need to define "duplicate". This answer uses "has same key column(s)" as the definition.
The UNIX command is sort -u (plus any other command line options needed).
Most server job developers use a hashed file to remove duplicates, relying on the fact that any write to a hashed file with the same key is a destructive overwrite.
The UNIX command is sort -u (plus any other command line options needed).
Most server job developers use a hashed file to remove duplicates, relying on the fact that any write to a hashed file with the same key is a destructive overwrite.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.