how can i remove duplicates from sequential file

suresh_dsx · Post by **suresh_dsx** » Mon May 08, 2006 11:33 am

hi guys,
i am really facing a problem , removing duplicates with sequential files .
my job is like this

Seqential_file1------------>Transformer-------------->Sequential_file2
(source)

plz any one give solutions..................

kris007 · Post by **kris007** » Mon May 08, 2006 11:47 am

Welcome Aboard,

Load all your data into a hashed file inorder to remove duplicates based on a key column.
So, your job design looks like this

SequentialFile--------->Transformer---->HashedFile.

Kris

DSguru2B · Post by **DSguru2B** » Mon May 08, 2006 12:14 pm

As Kris Mentioned. Use the hashed file and from there again transfer the data into a Sequential file.
Also, recognize the duplicates carefully.
For eg:
If you want remove redundancy of the entire record, define all the coulmns as keys.
If you want to just remove duplicates based on a certain column, define that column as a key.
Regards,

I_Server_Whale · Post by **I_Server_Whale** » Mon May 08, 2006 2:05 pm

Hi,

You can also achieve this in UNIX by using the "uniq" command in the before-stage-routine of the transformer. At this point, I'm not able to recall if there is any similar way in Windows.

Thanks,
Whale.