Remove duplicate keys

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
mattias.klint
Premium Member
Premium Member
Posts: 43
Joined: Wed Oct 18, 2006 6:03 am

Remove duplicate keys

Post by mattias.klint »

Hello!

I would like to identify duplicate dealnumbers. If a dealnumber is shown one time it's ok. If the same dealnumber is in two rows or more I want to remove the rows and insert them into another output for further use.

This seems to be easy but i cant figure out how.

Thank you for your help,
Mattias
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

This has been dealt with many times, try a search. If you dont get results get back.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Refer to this post. Look at the last post by me. It gives a design description along with code.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
mattias.klint
Premium Member
Premium Member
Posts: 43
Joined: Wed Oct 18, 2006 6:03 am

Post by mattias.klint »

Thanks, I will look into your answer, seems kind of complicated but I'll give it a shot. I guess I have to learn in which forum I have to post my questions aswell :-(

Thank you.
//Mattias

(If it solves my problem I'll get back and mark my question as solved.)
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Its really not all that complicated. All you need is sorted input on the key, two stage variables and two output links. The stage variable will compare current key to previous and mark flag it. Use the output link constraints to constraint the output.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
mattias.klint
Premium Member
Premium Member
Posts: 43
Joined: Wed Oct 18, 2006 6:03 am

Post by mattias.klint »

But I think my problem is a more complicated than that.

If a dealnumber is repeated I want to remove all the rows containing the dealnumber. I dont want to keep any of them. If I only check if it is the same dealnumber with the row below I will always get one dealnumber left, or am I wrong.

10 100
10 200
20 300
30 400

should generate:
first output:
20 300
30 400

second output:
10 100
10 200

thx,
Mattias
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Do this.
Sort the incoming data on your key. Define two stage variables in the transformer, say condFlag and prevVal. The will basically detect duplicates and flag them. Their both will be initialized to 0. Their derivation will be as follows:

Code: Select all


condFlag  | if (prevVal <> src.key) then 'X' else 'Y' 
prevVal   | src.key 


Have a link coming out of here to a hashed file keyed on src.key (your first column).
Constraint for buildHash: condFlag = 'Y'

In the same job or maybe a second job feed the same source file and do a lookup on this hashed file keyed on your first column (key). Provide the constraint as reflink.NOTFOUNDfor one output link, where reflink is your reference link name. The link will give you your "first output" as you mentioned in your previours post. which will have all the records which are duplicates based on key.
Have a second output link with constraint NOT(reflink.NOTFOUND). This will give you your "second output".
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
mattias.klint
Premium Member
Premium Member
Posts: 43
Joined: Wed Oct 18, 2006 6:03 am

Post by mattias.klint »

Ahh but If I run it through another transformer and check with the output file from the first transformer I can delete all instances of the dealnumbers that corresponds to the dealnumbers in theoutput file from the first trd.

That should work, but first lunch!

//Mattias
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

You wont be deleting anything. The hashed file will contain only keys which have duplicates in your source file. Then you run your source file again against this hashed file and do a NOT of NOTFOUND for duplicate records and NOTFOUND non duplicate records. No data will be deleted or missed.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
mattias.klint
Premium Member
Premium Member
Posts: 43
Joined: Wed Oct 18, 2006 6:03 am

"Step by step"

Post by mattias.klint »

I feel like the new kid on the block needing this much help!

Now it works and it is beautiful. Thank you very much. If you ever come by Santiago or Stockholm, let me buy you beer ;-)

//Mattias
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Your most welcome :P
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Umm... last time I looked Santiago was nowhere near Stockholm. That must be pretty amazing beer you have!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

If you had made that offer before, you would have got a quicker solution. Just kiddin :wink:
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
mattias.klint
Premium Member
Premium Member
Posts: 43
Joined: Wed Oct 18, 2006 6:03 am

Post by mattias.klint »

Depending when you have time for that beer ;-) Now Santigo, in the future it will be in Sthlm.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

narasimha wrote:If you had made that offer before, you would have got a quicker solution. Just kiddin :wink:
101 % :P
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply