Remove duplicate keys

mattias.klint · Post by **mattias.klint** » Wed Jan 24, 2007 2:33 pm

Hello!

I would like to identify duplicate dealnumbers. If a dealnumber is shown one time it's ok. If the same dealnumber is in two rows or more I want to remove the rows and insert them into another output for further use.

This seems to be easy but i cant figure out how.

Thank you for your help,
Mattias

narasimha · Post by **narasimha** » Wed Jan 24, 2007 2:42 pm

This has been dealt with many times, try a search. If you dont get results get back.

DSguru2B · Post by **DSguru2B** » Wed Jan 24, 2007 3:01 pm

Refer to this post. Look at the last post by me. It gives a design description along with code.

mattias.klint · Post by **mattias.klint** » Wed Jan 24, 2007 3:17 pm

Thanks, I will look into your answer, seems kind of complicated but I'll give it a shot. I guess I have to learn in which forum I have to post my questions aswell :-(

Thank you.
//Mattias

(If it solves my problem I'll get back and mark my question as solved.)

DSguru2B · Post by **DSguru2B** » Wed Jan 24, 2007 3:34 pm

Its really not all that complicated. All you need is sorted input on the key, two stage variables and two output links. The stage variable will compare current key to previous and mark flag it. Use the output link constraints to constraint the output.

mattias.klint · Post by **mattias.klint** » Thu Jan 25, 2007 10:36 am

But I think my problem is a more complicated than that.

If a dealnumber is repeated I want to remove all the rows containing the dealnumber. I dont want to keep any of them. If I only check if it is the same dealnumber with the row below I will always get one dealnumber left, or am I wrong.

10 100
10 200
20 300
30 400

should generate:
first output:
20 300
30 400

second output:
10 100
10 200

thx,
Mattias

DSguru2B · Post by **DSguru2B** » Thu Jan 25, 2007 10:50 am

Do this.
Sort the incoming data on your key. Define two stage variables in the transformer, say condFlag and prevVal. The will basically detect duplicates and flag them. Their both will be initialized to 0. Their derivation will be as follows:

Code: Select all


condFlag  | if (prevVal <> src.key) then 'X' else 'Y' 
prevVal   | src.key

Have a link coming out of here to a hashed file keyed on src.key (your first column).
Constraint for buildHash: condFlag = 'Y'

In the same job or maybe a second job feed the same source file and do a lookup on this hashed file keyed on your first column (key). Provide the constraint as reflink.NOTFOUNDfor one output link, where reflink is your reference link name. The link will give you your "first output" as you mentioned in your previours post. which will have all the records which are duplicates based on key.
Have a second output link with constraint NOT(reflink.NOTFOUND). This will give you your "second output".

mattias.klint · Post by **mattias.klint** » Thu Jan 25, 2007 10:52 am

Ahh but If I run it through another transformer and check with the output file from the first transformer I can delete all instances of the dealnumbers that corresponds to the dealnumbers in theoutput file from the first trd.

That should work, but first lunch!

//Mattias

DSguru2B · Post by **DSguru2B** » Thu Jan 25, 2007 11:00 am

You wont be deleting anything. The hashed file will contain only keys which have duplicates in your source file. Then you run your source file again against this hashed file and do a NOT of NOTFOUND for duplicate records and NOTFOUND non duplicate records. No data will be deleted or missed.

mattias.klint · Post by **mattias.klint** » Thu Jan 25, 2007 2:50 pm

I feel like the new kid on the block needing this much help!

Now it works and it is beautiful. Thank you very much. If you ever come by Santiago or Stockholm, let me buy you beer ;-)

//Mattias

DSguru2B · Post by **DSguru2B** » Thu Jan 25, 2007 2:51 pm

Your most welcome

ray.wurlod · Post by **ray.wurlod** » Thu Jan 25, 2007 2:52 pm

Umm... last time I looked Santiago was nowhere near Stockholm. That must be pretty amazing beer you have!

narasimha · Post by **narasimha** » Thu Jan 25, 2007 2:54 pm

If you had made that offer before, you would have got a quicker solution. Just kiddin

mattias.klint · Post by **mattias.klint** » Thu Jan 25, 2007 2:55 pm

Depending when you have time for that beer ;-) Now Santigo, in the future it will be in Sthlm.

DSguru2B · Post by **DSguru2B** » Thu Jan 25, 2007 2:58 pm

narasimha wrote:If you had made that offer before, you would have got a quicker solution. Just kiddin

101 %

DSXchange

Remove duplicate keys

Remove duplicate keys

"Step by step"