Finding the occurence of a substring in a string

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nyk1989
Participant
Posts: 22
Joined: Mon Dec 14, 2015 9:32 am

Finding the occurence of a substring in a string

Post by nyk1989 »

hey guys,

need your urgent help!!!

i have an input column with data like -

ABC,DEF,GHI (3 values concatenated with comma delimiter)

In the output, I want the position of each values like -

for ABC - o/p should be 1
for DEF - o/p should be 2

Any idea how can we do this in DS? I tried Index function but there we get the position character wise - so that gives me wrong output.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not quite following. Seems to me you could just number them yourself seeing as how there is always three. Yes? Or does the number change? In that case it might be as simple as counting the commas (add 1) but I think we'd need something more specific as to what exactly it is you need to do.
-craig

"You can never have too many knives" -- Logan Nine Fingers
nyk1989
Participant
Posts: 22
Joined: Mon Dec 14, 2015 9:32 am

Post by nyk1989 »

I/p -
ID KEY VALUE
12 CBD XXX
12 GTI YYY
22 CBD 123
22 SC_ID 456
22 FUND 789

o/p -
ID CBD GTI SC_ID FUND
12 XXX YYY null null
22 123 null 456 789

so, this is how my i/p and o/p should be.

Current Design - I/P -> xfrmer -> Rem Dup -> xfrmer -> seq file

in the first xfm, i am concatenating the KEY and VALUE columns based on ID column.

o/p after this stage -

ID KEY VALUE KEY_AGG VALUE_AGG
12 CBD XXX CBD XXX
12 GTI YYY CBD,GTI XXX,YYY
and so on..

Rem dups - i am retaining the last row.

final xfm - i want to divide this into individual KEY columns.

i am thinking something like - ( this will be the derivation for the CBD column)

if count(KEY_AGG,CBD) = 1 then field(VALUE_AGG,',', index(KEY_AGG,'CBD',1),1) else setnull()

but here, index option will not work as it will calculate the position character wise and not string wise.

so if i can get the correct substring position, then that can be substituted here in the FIELD function and the proper VALUES will get mapped to the KEY columns.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Personally I would tackle it as follows:

1. Split off your column names and create a single delimited string of them all ordered alphabetically

Code: Select all

CBD|GTI|FUND|SC_ID
2. Join that delimited string of column names back to your original data
eg

Code: Select all

22 CBD 123 CBD|GTI|FUND|SC_ID
3. Use stage variables to do a find and replace of the column name with your value. This can be achieved by using a combo of Field,Index and Len functions.
Hold the following values in stage variables
a) Get the first column name in your delimited string using Field (EDIT: actually you don't need this)
b) Find the starting position of the column name using Index
c) Perform a Dcount of the string to find the column position
d) Use a stage variable to build your output string.
* Use the column position number to determine whether you need to output your input value. If not append a delimiter to represent an empty column. Eg The first value 123 is for Column CBD. So build start your output string accordingly:

Code: Select all

svOutputString: 123|
Next column to be populated for record 22 is FUND (alphabetical) so you find the start position of FUND in your string (9), and perform a DCount of the start of that string ie (DCount(Left(DelimitedString,9); (should return 3 - I can never remember if DCount returns the delimiter count or the column count)
With this information you know that you have jumped from Column 1 to column 3 so insert an empty column (ie an extra delimiter) and then place the value in your output string)

Code: Select all

svOutputString: 123||789
Continue doing this until you run out of columns.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

FYI - changed your subject to match your actual situation more better. From "position" to "occurrence".
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ShaneMuir wrote:I can never remember if DCount returns the delimiter count or the column count
From the documentation:

Use the DCOUNT function to return the number of delimited fields in a data string
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

this is the sort of thing that is infinitely easier in a routine.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The easiest approach would be to use a loop in the Transformer stage. You will find an example in the documentation about processing a group of records. Make use of the SaveInputRecord() and GetSavedInputRecord() functions to maintain a "stack" of input records. Use a Loop variable with the Field() function, with @ITERATION as the third argument, to extract the individual values.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply