Page 1 of 1

Removed duplicate strings in a column

Posted: Tue Jun 10, 2014 5:58 am
by Harini
Hi All,

Am trying to concatenate a error description column, based on key column.

Say my data is like this.

ID, Number, Error Code
100, MD350, Invalid Amount
101, MD310, Invalid Account
102, MD350, Invalid Code
103, MD400, Invalid Transaction
104, MD350, Invalid Code

I wanted combine the error code for all the transaction which has same number. Eg:

My output would look like:

100, MD350, Invalid Amount, Invalid Code, Invalid Code

I am able to achieve this using Sort, Key change and comparing previous and current values. The problem now i face is to remove duplicates there.

Say if i want to remove the 'Invalid Code' which is repeating twice there. Is there a way we can look for a string which is repeating in a column and remove it?

Thanks.

Posted: Tue Jun 10, 2014 6:28 am
by ArndW
There are several ways to do this, but I'd keep it simple and use the INDEX() function before adding a new error code to the string holding the error codes. If "INDEX(ConcatenatedString,In.ErrorCode,1)=0 THEN {add} ELSE {don't add}".

Posted: Tue Jun 10, 2014 12:56 pm
by ssnegi
you can use count function on the previous string.
if count(Previous String, "Current String") > 0 then don't concatenate else concatenate.

Posted: Fri Jun 20, 2014 10:57 pm
by Harini
Premium membership expired :(

Actually, the way, i would have to identify multiple errors would be in the same variable. Eg: If a record is going through multiple lookups, i check for the derived values which is null in the transformer in the same variable and am populating it.

Posted: Mon Jun 23, 2014 3:06 am
by mesuku
You can get this output by deleting duplicates ( Second & Third Columns in source) and then concatinate the error code coumn data using sort and transformer stage


Thanks
Suresh

Posted: Tue Jun 24, 2014 1:18 pm
by shepli
1) Sort the data with the Number and (then) ID as key
2) In a transformer, use a Stage Variable, e.g. MySV, to hold the "ID, Number, Error Code" values from the first row (no output).
2) Go thru the dataset row by row,
If the Nunber in a current row is the same as the previous row, concatenate the Error Code to the Stage Variable, no output (Note: you may not want to use comma as delimiter for the Error Code because it may created multiple columns);
If the Number is different from the previous row, output the value held in the Stage Variable and reset the Stage Variable to the "ID, Number, Error Code" values from the currentt row.
3) You need to have a way to handle the last row.

Hope the above helps.