Page 1 of 1

efficient string manipulation question

Posted: Wed Nov 11, 2015 6:51 am
by qt_ky
What would be the most efficient way to achieve the following logic in a Parallel job?

A varchar(100) column may contain zero or more commas anywhere in the string.

For each comma, if the previous character is a digit 0-9 and the following character is a digit 0-9, then remove the comma, otherwise replace the comma with a space.

If the column contains zero commas, pass the unchanged column value through. Examples:

Code: Select all

"$321.00 4 5" becomes "$321.00 4 5" (unchanged)
"1,234 miles" becomes "1234 miles"
"Lets eat, grandpa!" becomes "Let's eat  grandpa!"
"Area 5, Sections 6,7,8,,9" becomes "Area 5  Sections 678  9"
",Parts A,B,C," becomes " Parts A B C "

Posted: Wed Nov 11, 2015 8:34 am
by chulett
Sorry, just had to LOL at the classic "Let's eat grandpa!". For want of a comma... :wink:

Posted: Tue Nov 17, 2015 8:11 am
by ShaneMuir
Not sure about most efficient, but I can only think of 2 ways:
1. Using some sort of regular expression at the unix command line (eg sed)
2. If using DS, the some combination of stage and loop variables that would determine the number of commas (DCount), hold the current, previous and/or next delimited field, and perform tests on the first or last char of those fields as required, before rebuilding your output stream based on your requirements.

Posted: Wed Dec 02, 2015 12:13 am
by siauchun84
I will suggest you to use the QualityStage to perform that as you can customize the pattern as you wish.