how can i remove duplicates from sequential file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
suresh_dsx
Participant
Posts: 160
Joined: Tue May 02, 2006 7:49 am

how can i remove duplicates from sequential file

Post by suresh_dsx »

hi guys,
i am really facing a problem , removing duplicates with sequential files .
my job is like this

Seqential_file1------------>Transformer-------------->Sequential_file2
(source)

plz any one give solutions..................
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Welcome Aboard,

Load all your data into a hashed file inorder to remove duplicates based on a key column.
So, your job design looks like this

SequentialFile--------->Transformer---->HashedFile.

Kris
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

As Kris Mentioned. Use the hashed file and from there again transfer the data into a Sequential file.
Also, recognize the duplicates carefully.
For eg:
If you want remove redundancy of the entire record, define all the coulmns as keys.
If you want to just remove duplicates based on a certain column, define that column as a key.
Regards,
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

Hi,

You can also achieve this in UNIX by using the "uniq" command in the before-stage-routine of the transformer. At this point, I'm not able to recall if there is any similar way in Windows.

Thanks,
Whale.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
Post Reply