Page 1 of 1
Vice Versa Duplicates
Posted: Wed Jan 08, 2014 6:53 am
by Madhusudhan
I have scenario.
Input :
source | destination | distance
city1 | city2 | 500
city2 | city1 |500
city3 | city4 | 500
city4 |city3 | 500
How to delete vice versa duplicates.
Output should be:
source | destination | distance
city1 | city2 | 500
city3 | city4 | 500
How to achieve above situation using datastage.
Thanks in advance
Posted: Wed Jan 08, 2014 10:19 am
by boxtoby
Hi Madhusudhan,
If you use a transformer with stage variables set as follows:
svSource:
if DSLink2.source < DSLink2.destination then DSLink2.source else DSLink2.destination
svDestination:
if DSLink2.source < DSLink2.destination then DSLink2.destination else DSLink2.source
and map those to their respective output columns you will then have City1 and City2 in the right place so that a remove duplicates stage will do what you require.
Hope that helps!
Cheers,
Bob.
Posted: Wed Jan 08, 2014 10:21 am
by Mike
Use a transformer to set field1 to be the min of field1 and field2, and set field2 to be the max of field1 and field2, then follow it with a typical remove duplicates stage with field1 and field2 as key.
If you want to retain the original order of field1 and field2 in your output, add a couple of columns to the transformer output (e.g. orig_field1 and orig_field2) and follow the remove duplicates stage with another transformer to restore the original ordering of field1 and field2.
Mike
Posted: Wed Jan 08, 2014 10:32 am
by boxtoby
Hi Mike,
Unless I did something wrong in my test job, Min() and Max() both output zero, presumably because they are numeric functions not expecting non-numeric input?
Cheers,
Bob.
Posted: Wed Jan 08, 2014 10:34 am
by Mike
I didn't mean using a literal min / max function ... simple use of if-then-else is sufficient.
Mike
Posted: Mon Jan 13, 2014 6:32 am
by Madhusudhan
Thanks boxtoby.
Your logic worked for me.