Vice Versa Duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Madhusudhan
Participant
Posts: 5
Joined: Mon Feb 07, 2011 1:17 am
Location: madhu

Vice Versa Duplicates

Post by Madhusudhan »

I have scenario.

Input :

source | destination | distance
city1 | city2 | 500
city2 | city1 |500
city3 | city4 | 500
city4 |city3 | 500

How to delete vice versa duplicates.

Output should be:

source | destination | distance
city1 | city2 | 500
city3 | city4 | 500

How to achieve above situation using datastage.

Thanks in advance
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Hi Madhusudhan,

If you use a transformer with stage variables set as follows:

svSource:
if DSLink2.source < DSLink2.destination then DSLink2.source else DSLink2.destination

svDestination:
if DSLink2.source < DSLink2.destination then DSLink2.destination else DSLink2.source

and map those to their respective output columns you will then have City1 and City2 in the right place so that a remove duplicates stage will do what you require.

Hope that helps!

Cheers,
Bob.
Bob Oxtoby
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Use a transformer to set field1 to be the min of field1 and field2, and set field2 to be the max of field1 and field2, then follow it with a typical remove duplicates stage with field1 and field2 as key.

If you want to retain the original order of field1 and field2 in your output, add a couple of columns to the transformer output (e.g. orig_field1 and orig_field2) and follow the remove duplicates stage with another transformer to restore the original ordering of field1 and field2.

Mike
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Hi Mike,

Unless I did something wrong in my test job, Min() and Max() both output zero, presumably because they are numeric functions not expecting non-numeric input?

Cheers,
Bob.
Bob Oxtoby
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

I didn't mean using a literal min / max function ... simple use of if-then-else is sufficient.

Mike
Madhusudhan
Participant
Posts: 5
Joined: Mon Feb 07, 2011 1:17 am
Location: madhu

Post by Madhusudhan »

Thanks boxtoby.

Your logic worked for me.
Post Reply