remove duplicates

deva · Post by **deva** » Fri Jan 18, 2008 10:28 am

Hi
I am using one flat file. That file have 7 key columns. In that one key col name is cert_no. While loading the data I need to load only distinct cert_no.

I am using hash file and passing all the columns (include 7 key cols) through hash file.

If I did can I get distinct record?

The link between hash file and transformer is "stream"

xjonny · Post by **xjonny** » Fri Jan 18, 2008 11:00 am

deva wrote:Hi
I am using one flat file. That file have 7 key columns. In that one key col name is cert_no. While loading the data I need to load only distinct cert_no.

I am using hash file and passing all the columns (include 7 key cols) through hash file.

If I did can I get distinct record?

The link between hash file and transformer is "stream"

Hello, deva!
You have to set "key" to cert_no column only when you write to hash file and read from it. I.e. hash file rewrites record (makes it "distinct") according to the key information. Some DS stages use key information in their own way, so you'd probably want to add transformer before and after hash file in order to change key information if you need it.

Be careful! In hash file only the last record would be saved. Probably you have to try aggregator stage. It has a lot of options which can suit you.

DSXchange

remove duplicates

remove duplicates

Re: remove duplicates