Finding the occurence of a substring in a string
Moderators: chulett, rschirm, roy
Finding the occurence of a substring in a string
hey guys,
need your urgent help!!!
i have an input column with data like -
ABC,DEF,GHI (3 values concatenated with comma delimiter)
In the output, I want the position of each values like -
for ABC - o/p should be 1
for DEF - o/p should be 2
Any idea how can we do this in DS? I tried Index function but there we get the position character wise - so that gives me wrong output.
need your urgent help!!!
i have an input column with data like -
ABC,DEF,GHI (3 values concatenated with comma delimiter)
In the output, I want the position of each values like -
for ABC - o/p should be 1
for DEF - o/p should be 2
Any idea how can we do this in DS? I tried Index function but there we get the position character wise - so that gives me wrong output.
Not quite following. Seems to me you could just number them yourself seeing as how there is always three. Yes? Or does the number change? In that case it might be as simple as counting the commas (add 1) but I think we'd need something more specific as to what exactly it is you need to do.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
I/p -
ID KEY VALUE
12 CBD XXX
12 GTI YYY
22 CBD 123
22 SC_ID 456
22 FUND 789
o/p -
ID CBD GTI SC_ID FUND
12 XXX YYY null null
22 123 null 456 789
so, this is how my i/p and o/p should be.
Current Design - I/P -> xfrmer -> Rem Dup -> xfrmer -> seq file
in the first xfm, i am concatenating the KEY and VALUE columns based on ID column.
o/p after this stage -
ID KEY VALUE KEY_AGG VALUE_AGG
12 CBD XXX CBD XXX
12 GTI YYY CBD,GTI XXX,YYY
and so on..
Rem dups - i am retaining the last row.
final xfm - i want to divide this into individual KEY columns.
i am thinking something like - ( this will be the derivation for the CBD column)
if count(KEY_AGG,CBD) = 1 then field(VALUE_AGG,',', index(KEY_AGG,'CBD',1),1) else setnull()
but here, index option will not work as it will calculate the position character wise and not string wise.
so if i can get the correct substring position, then that can be substituted here in the FIELD function and the proper VALUES will get mapped to the KEY columns.
ID KEY VALUE
12 CBD XXX
12 GTI YYY
22 CBD 123
22 SC_ID 456
22 FUND 789
o/p -
ID CBD GTI SC_ID FUND
12 XXX YYY null null
22 123 null 456 789
so, this is how my i/p and o/p should be.
Current Design - I/P -> xfrmer -> Rem Dup -> xfrmer -> seq file
in the first xfm, i am concatenating the KEY and VALUE columns based on ID column.
o/p after this stage -
ID KEY VALUE KEY_AGG VALUE_AGG
12 CBD XXX CBD XXX
12 GTI YYY CBD,GTI XXX,YYY
and so on..
Rem dups - i am retaining the last row.
final xfm - i want to divide this into individual KEY columns.
i am thinking something like - ( this will be the derivation for the CBD column)
if count(KEY_AGG,CBD) = 1 then field(VALUE_AGG,',', index(KEY_AGG,'CBD',1),1) else setnull()
but here, index option will not work as it will calculate the position character wise and not string wise.
so if i can get the correct substring position, then that can be substituted here in the FIELD function and the proper VALUES will get mapped to the KEY columns.
Personally I would tackle it as follows:
1. Split off your column names and create a single delimited string of them all ordered alphabetically
2. Join that delimited string of column names back to your original data
eg
3. Use stage variables to do a find and replace of the column name with your value. This can be achieved by using a combo of Field,Index and Len functions.
Hold the following values in stage variables
a) Get the first column name in your delimited string using Field (EDIT: actually you don't need this)
b) Find the starting position of the column name using Index
c) Perform a Dcount of the string to find the column position
d) Use a stage variable to build your output string.
* Use the column position number to determine whether you need to output your input value. If not append a delimiter to represent an empty column. Eg The first value 123 is for Column CBD. So build start your output string accordingly:
Next column to be populated for record 22 is FUND (alphabetical) so you find the start position of FUND in your string (9), and perform a DCount of the start of that string ie (DCount(Left(DelimitedString,9); (should return 3 - I can never remember if DCount returns the delimiter count or the column count)
With this information you know that you have jumped from Column 1 to column 3 so insert an empty column (ie an extra delimiter) and then place the value in your output string)
Continue doing this until you run out of columns.
1. Split off your column names and create a single delimited string of them all ordered alphabetically
Code: Select all
CBD|GTI|FUND|SC_ID
eg
Code: Select all
22 CBD 123 CBD|GTI|FUND|SC_ID
Hold the following values in stage variables
a) Get the first column name in your delimited string using Field (EDIT: actually you don't need this)
b) Find the starting position of the column name using Index
c) Perform a Dcount of the string to find the column position
d) Use a stage variable to build your output string.
* Use the column position number to determine whether you need to output your input value. If not append a delimiter to represent an empty column. Eg The first value 123 is for Column CBD. So build start your output string accordingly:
Code: Select all
svOutputString: 123|
With this information you know that you have jumped from Column 1 to column 3 so insert an empty column (ie an extra delimiter) and then place the value in your output string)
Code: Select all
svOutputString: 123||789
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The easiest approach would be to use a loop in the Transformer stage. You will find an example in the documentation about processing a group of records. Make use of the SaveInputRecord() and GetSavedInputRecord() functions to maintain a "stack" of input records. Use a Loop variable with the Field() function, with @ITERATION as the third argument, to extract the individual values.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.