Page 1 of 1

Remove duplicate keys

Posted: Wed Jan 24, 2007 2:33 pm
by mattias.klint
Hello!

I would like to identify duplicate dealnumbers. If a dealnumber is shown one time it's ok. If the same dealnumber is in two rows or more I want to remove the rows and insert them into another output for further use.

This seems to be easy but i cant figure out how.

Thank you for your help,
Mattias

Posted: Wed Jan 24, 2007 2:42 pm
by narasimha
This has been dealt with many times, try a search. If you dont get results get back.

Posted: Wed Jan 24, 2007 3:01 pm
by DSguru2B
Refer to this post. Look at the last post by me. It gives a design description along with code.

Posted: Wed Jan 24, 2007 3:17 pm
by mattias.klint
Thanks, I will look into your answer, seems kind of complicated but I'll give it a shot. I guess I have to learn in which forum I have to post my questions aswell :-(

Thank you.
//Mattias

(If it solves my problem I'll get back and mark my question as solved.)

Posted: Wed Jan 24, 2007 3:34 pm
by DSguru2B
Its really not all that complicated. All you need is sorted input on the key, two stage variables and two output links. The stage variable will compare current key to previous and mark flag it. Use the output link constraints to constraint the output.

Posted: Thu Jan 25, 2007 10:36 am
by mattias.klint
But I think my problem is a more complicated than that.

If a dealnumber is repeated I want to remove all the rows containing the dealnumber. I dont want to keep any of them. If I only check if it is the same dealnumber with the row below I will always get one dealnumber left, or am I wrong.

10 100
10 200
20 300
30 400

should generate:
first output:
20 300
30 400

second output:
10 100
10 200

thx,
Mattias

Posted: Thu Jan 25, 2007 10:50 am
by DSguru2B
Do this.
Sort the incoming data on your key. Define two stage variables in the transformer, say condFlag and prevVal. The will basically detect duplicates and flag them. Their both will be initialized to 0. Their derivation will be as follows:

Code: Select all


condFlag  | if (prevVal <> src.key) then 'X' else 'Y' 
prevVal   | src.key 


Have a link coming out of here to a hashed file keyed on src.key (your first column).
Constraint for buildHash: condFlag = 'Y'

In the same job or maybe a second job feed the same source file and do a lookup on this hashed file keyed on your first column (key). Provide the constraint as reflink.NOTFOUNDfor one output link, where reflink is your reference link name. The link will give you your "first output" as you mentioned in your previours post. which will have all the records which are duplicates based on key.
Have a second output link with constraint NOT(reflink.NOTFOUND). This will give you your "second output".

Posted: Thu Jan 25, 2007 10:52 am
by mattias.klint
Ahh but If I run it through another transformer and check with the output file from the first transformer I can delete all instances of the dealnumbers that corresponds to the dealnumbers in theoutput file from the first trd.

That should work, but first lunch!

//Mattias

Posted: Thu Jan 25, 2007 11:00 am
by DSguru2B
You wont be deleting anything. The hashed file will contain only keys which have duplicates in your source file. Then you run your source file again against this hashed file and do a NOT of NOTFOUND for duplicate records and NOTFOUND non duplicate records. No data will be deleted or missed.

"Step by step"

Posted: Thu Jan 25, 2007 2:50 pm
by mattias.klint
I feel like the new kid on the block needing this much help!

Now it works and it is beautiful. Thank you very much. If you ever come by Santiago or Stockholm, let me buy you beer ;-)

//Mattias

Posted: Thu Jan 25, 2007 2:51 pm
by DSguru2B
Your most welcome :P

Posted: Thu Jan 25, 2007 2:52 pm
by ray.wurlod
Umm... last time I looked Santiago was nowhere near Stockholm. That must be pretty amazing beer you have!

Posted: Thu Jan 25, 2007 2:54 pm
by narasimha
If you had made that offer before, you would have got a quicker solution. Just kiddin :wink:

Posted: Thu Jan 25, 2007 2:55 pm
by mattias.klint
Depending when you have time for that beer ;-) Now Santigo, in the future it will be in Sthlm.

Posted: Thu Jan 25, 2007 2:58 pm
by DSguru2B
narasimha wrote:If you had made that offer before, you would have got a quicker solution. Just kiddin :wink:
101 % :P