Page 1 of 1

Scenario: Identify duplicated in a column

Posted: Tue Jul 22, 2014 4:39 am
by nagadastagirireddy
Hi,
I have a scenario in my project, in a column there are two values separated by comma as shown below.

Input
-----------
AUS,IND
AUS,PAK
IND,AUS
IND,PAK
PAK,AUS
PAK,IND

If you observe in the above data, there a combinations of "AUS,IND" and "IND,AUS". In this case I have to consider them as duplicates and load any one of them in table.
My output should look as below.

Expected Result
--------------------
AUS,IND
AUS,PAK
IND,PAK
PAK,AUS

Please help me with this scenario.

Many Thanks in advance.

Posted: Tue Jul 22, 2014 5:35 am
by stuartjvnorton
Transformer: split using Field and put into alphabetical order, eg: IND,AUS -> AUS,IND
Sort with key change
Filter by key change= 1

Posted: Tue Jul 22, 2014 9:29 am
by nagadastagirireddy
Hi,
If I split and sort the data then it will look like below.

After applying split and sort
-------------------------------
AUS,AUS
AUS,AUS
IND,IND
IND,IND
PAK,PAK
PAK,PAK

This might not give expected result.

Posted: Tue Jul 22, 2014 4:19 pm
by yugee
Simple, but works as required.

Transformer:

COL1 --> is your input with two fields separated by ','

Declare Below Stage Variables:

svIndex - Index(DSLink1.COL1,',',1) This will find the position of ','
svLength - Len(DSLink1.COL1) this is to find the length of your column
svField1 - Left(DSLink1.COL1,svIndex-1) Left portion of your input column
svField2 - Right(DSLink1.COL1,svLength-svIndex) Right portion of your input


Declare an output column:
MERGEKEY -

If Compare(svField1,svField2) > 0 Then
svField1:svField2
Else
svField2:svField1

Finally Have a Sort Stage - to remove duplicates on MERGEKEY

Let me know if it solves your requirement.

Posted: Tue Jul 22, 2014 10:37 pm
by nagadastagirireddy
Thank a lot Yugee, that works perfect.

But one small check, we can directly apply field function to seperate the string, instead why are you using Index and field?

Posted: Wed Jul 23, 2014 12:25 pm
by yugee
Good to know...

No specific reason for using Index - I was testing this function as part of some other issue...you can use any other functions to split the field into two...

Posted: Wed Jul 23, 2014 1:32 pm
by ssnegi
Read the input twice. Number the rows @INROWNUM.
Join COL1 to COL2 and COL2 to COL1 in lookup stage.
So there will be 4 cols col1,col2,col3,col4.
AUS,IND,IND,AUS,1,3
IND,AUS,AUS,IND,3,1
Then add Num1+Num2
AUS,IND,IND,AUS,4
IND,AUS,AUS,IND,4
Then do remove duplicate on Num.