Page 1 of 1

Vice Versa Duplicates

Posted: Wed Jan 08, 2014 6:53 am
by Madhusudhan
I have scenario.

Input :

source | destination | distance
city1 | city2 | 500
city2 | city1 |500
city3 | city4 | 500
city4 |city3 | 500

How to delete vice versa duplicates.

Output should be:

source | destination | distance
city1 | city2 | 500
city3 | city4 | 500

How to achieve above situation using datastage.

Thanks in advance

Posted: Wed Jan 08, 2014 10:19 am
by boxtoby
Hi Madhusudhan,

If you use a transformer with stage variables set as follows:

svSource:
if DSLink2.source < DSLink2.destination then DSLink2.source else DSLink2.destination

svDestination:
if DSLink2.source < DSLink2.destination then DSLink2.destination else DSLink2.source

and map those to their respective output columns you will then have City1 and City2 in the right place so that a remove duplicates stage will do what you require.

Hope that helps!

Cheers,
Bob.

Posted: Wed Jan 08, 2014 10:21 am
by Mike
Use a transformer to set field1 to be the min of field1 and field2, and set field2 to be the max of field1 and field2, then follow it with a typical remove duplicates stage with field1 and field2 as key.

If you want to retain the original order of field1 and field2 in your output, add a couple of columns to the transformer output (e.g. orig_field1 and orig_field2) and follow the remove duplicates stage with another transformer to restore the original ordering of field1 and field2.

Mike

Posted: Wed Jan 08, 2014 10:32 am
by boxtoby
Hi Mike,

Unless I did something wrong in my test job, Min() and Max() both output zero, presumably because they are numeric functions not expecting non-numeric input?

Cheers,
Bob.

Posted: Wed Jan 08, 2014 10:34 am
by Mike
I didn't mean using a literal min / max function ... simple use of if-then-else is sufficient.

Mike

Posted: Mon Jan 13, 2014 6:32 am
by Madhusudhan
Thanks boxtoby.

Your logic worked for me.