Finding the occurence of a substring in a string

nyk1989 · Post by **nyk1989** » Tue Jan 10, 2017 4:19 am

hey guys,

need your urgent help!!!

i have an input column with data like -

ABC,DEF,GHI (3 values concatenated with comma delimiter)

In the output, I want the position of each values like -

for ABC - o/p should be 1
for DEF - o/p should be 2

Any idea how can we do this in DS? I tried Index function but there we get the position character wise - so that gives me wrong output.

chulett · Post by **chulett** » Tue Jan 10, 2017 7:57 am

Not quite following. Seems to me you could just number them yourself seeing as how there is always three. Yes? Or does the number change? In that case it might be as simple as counting the commas (add 1) but I think we'd need something more specific as to what exactly it is you need to do.

nyk1989 · Post by **nyk1989** » Tue Jan 10, 2017 8:55 am

I/p -
ID KEY VALUE
12 CBD XXX
12 GTI YYY
22 CBD 123
22 SC_ID 456
22 FUND 789

o/p -
ID CBD GTI SC_ID FUND
12 XXX YYY null null
22 123 null 456 789

so, this is how my i/p and o/p should be.

Current Design - I/P -> xfrmer -> Rem Dup -> xfrmer -> seq file

in the first xfm, i am concatenating the KEY and VALUE columns based on ID column.

o/p after this stage -

ID KEY VALUE KEY_AGG VALUE_AGG
12 CBD XXX CBD XXX
12 GTI YYY CBD,GTI XXX,YYY
and so on..

Rem dups - i am retaining the last row.

final xfm - i want to divide this into individual KEY columns.

i am thinking something like - ( this will be the derivation for the CBD column)

if count(KEY_AGG,CBD) = 1 then field(VALUE_AGG,',', index(KEY_AGG,'CBD',1),1) else setnull()

but here, index option will not work as it will calculate the position character wise and not string wise.

so if i can get the correct substring position, then that can be substituted here in the FIELD function and the proper VALUES will get mapped to the KEY columns.

ShaneMuir · Post by **ShaneMuir** » Tue Jan 10, 2017 9:44 am

Personally I would tackle it as follows:

1. Split off your column names and create a single delimited string of them all ordered alphabetically

Code: Select all

CBD|GTI|FUND|SC_ID

2. Join that delimited string of column names back to your original data
eg

Code: Select all

22 CBD 123 CBD|GTI|FUND|SC_ID

3. Use stage variables to do a find and replace of the column name with your value. This can be achieved by using a combo of Field,Index and Len functions.
Hold the following values in stage variables
a) Get the first column name in your delimited string using Field (EDIT: actually you don't need this)
b) Find the starting position of the column name using Index
c) Perform a Dcount of the string to find the column position
d) Use a stage variable to build your output string.
* Use the column position number to determine whether you need to output your input value. If not append a delimiter to represent an empty column. Eg The first value 123 is for Column CBD. So build start your output string accordingly:

Code: Select all

svOutputString: 123|

Next column to be populated for record 22 is FUND (alphabetical) so you find the start position of FUND in your string (9), and perform a DCount of the start of that string ie (DCount(Left(DelimitedString,9); (should return 3 - I can never remember if DCount returns the delimiter count or the column count)
With this information you know that you have jumped from Column 1 to column 3 so insert an empty column (ie an extra delimiter) and then place the value in your output string)

Code: Select all

svOutputString: 123||789

Continue doing this until you run out of columns.

chulett · Post by **chulett** » Tue Jan 10, 2017 12:40 pm

FYI - changed your subject to match your actual situation more better. From "position" to "occurrence".

chulett · Post by **chulett** » Tue Jan 10, 2017 12:54 pm

ShaneMuir wrote:I can never remember if DCount returns the delimiter count or the column count

From the documentation:

Use the DCOUNT function to return the number of delimited fields in a data string

UCDI · Post by **UCDI** » Tue Jan 10, 2017 1:52 pm

this is the sort of thing that is infinitely easier in a routine.

ray.wurlod · Post by **ray.wurlod** » Wed Jan 11, 2017 11:15 pm

The easiest approach would be to use a loop in the Transformer stage. You will find an example in the documentation about processing a group of records. Make use of the SaveInputRecord() and GetSavedInputRecord() functions to maintain a "stack" of input records. Use a Loop variable with the Field() function, with @ITERATION as the third argument, to extract the individual values.