remove duplicates

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
deva
Participant
Posts: 104
Joined: Fri Dec 29, 2006 1:54 pm

remove duplicates

Post by deva »

Hi
I am using one flat file. That file have 7 key columns. In that one key col name is cert_no. While loading the data I need to load only distinct cert_no.

I am using hash file and passing all the columns (include 7 key cols) through hash file.

If I did can I get distinct record?

The link between hash file and transformer is "stream"
xjonny
Participant
Posts: 16
Joined: Tue Oct 03, 2006 2:06 am

Re: remove duplicates

Post by xjonny »

deva wrote:Hi
I am using one flat file. That file have 7 key columns. In that one key col name is cert_no. While loading the data I need to load only distinct cert_no.

I am using hash file and passing all the columns (include 7 key cols) through hash file.

If I did can I get distinct record?

The link between hash file and transformer is "stream"
Hello, deva!
You have to set "key" to cert_no column only when you write to hash file and read from it. I.e. hash file rewrites record (makes it "distinct") according to the key information. Some DS stages use key information in their own way, so you'd probably want to add transformer before and after hash file in order to change key information if you need it.

Be careful! In hash file only the last record would be saved. Probably you have to try aggregator stage. It has a lot of options which can suit you.
IT happens...
Post Reply