DSXchange

Posted: **Thu Mar 02, 2006 3:54 am**

Hi all,

I am new in DS. I have one doubt.My source is flat file,is having duplicates.I want to load the data to flat file(target).I dont want duplicates in my target.How can i do that?

please help.

Thanks in Advance

praveen

Posted: **Thu Mar 02, 2006 3:59 am**

Use a hashed file selecting the keys as the columns based on which you want eliminate the dups.

Posted: **Thu Mar 02, 2006 4:01 am**

Hi!

1. solution: While writing the data to the file, you write the key columns to a hashed file, which you lookup for knowing the duplicates. If this lookup is a success you discard this column. This gives you always the first row!

2.solution :Wrtie the data to a hashed file complete, read the hashed file and write your output file. This gives you always the last row!

Good Luck

Posted: **Thu Mar 02, 2006 4:01 am**

There are few ways to eliminate duplicates in a server job. Load data into hashed file (define the key) and load into target as hashed file will not allow duplicates. Stage variables/aggregation stage can be used to eliminate duplicates as well.

HTWH.

Regards
Elavenil

Posted: **Thu Mar 02, 2006 4:27 am**

Since your source is a flat file, you can pre-process this using "sort -u" on UNIX or if your server has a package like MKS Toolkit installed. If not, you can do a normal DOS sort (which won't remove duplicates) and then use a stage variable in a transform to store the previous row's value and then doing a compare with the current value to remove duplicates.

Both of these methods can be faster than using an interim hashed file, which is also a viable answer to your question.

Posted: **Thu Mar 02, 2006 4:29 am**

Hi

You can also use unix sort -u command to remove the duplicates.
--Balaji S.R

Posted: **Thu Mar 02, 2006 4:33 am**

Sorry did not notice the post by Arnd.

--Balaji S.R

Posted: **Thu Mar 02, 2006 5:17 am**

elavenil wrote:There are few ways to eliminate duplicates in a server job. Load data into hashed file (define the key) and load into target as hashed file will not allow duplicates. Stage variables/aggregation stage can be used to eliminate duplicates as well.

HTWH.

Regards
Elavenil

Thanks to all,

How can u eliminate duplicates using stage variables?

Regards
praveen

Posted: **Thu Mar 02, 2006 5:36 am**

Let's say you have two columns in your link, In.Key and In.Data and you want to remove duplicate keys and keep the first record.

Declare stage variables and derivations as:

Code: Select all

IsDuplicate     IF(In.Key = LastKey) THEN 1 ELSE 0
LastKey     In.Key

Then in your constraint put the value "IsDuplicate = 0" so that duplicates are not passed out of the transform stage. The order of "IsDuplicate" and "LastKey" is vital, if you change the order around the method will not work.

Posted: **Thu Mar 02, 2006 6:40 am**

Dare to search

DSXchange

How can i remove duplicate rows

How can i remove duplicate rows

Re: How can i remove duplicate rows