Hello all,
We need to extract year and quarter from a string. Please find our source string below.
Source String may contain below values.
ABCD1998.3EFH.GHAI.678ABCD.EF
1998.3ABCD.EFH.AJG.SS.123SSS...SSS...
ABC.DEF.EF345GH.124SJLAS..1998.3
We need to extract the 1998 as the year from above 3 strings and also quarter which follows the year with a dot...
I have gone through the forum, explored with index function,field function,convert function..but none of them worked..
Thanks & Regards
Koti
Pattern - String Extraction
Moderators: chulett, rschirm, roy
Use following Unix command in External Filter stage.
Code: Select all
sed -n 's/.*\([0-9][0-9][0-9][0-9]\.[1-4]\).*/\1/p'
Last edited by rkashyap on Thu Jul 30, 2015 9:48 pm, edited 1 time in total.
So... you need to find the first occurrence of four numeric digits and then the single digit that follows immediately behind that after a 'dot'? Two separate fields or one? And is that always the case or if the four don't have a dot right there after them do you keep looking deeper in the string? Give up? Want to make sure we understand the full glory of your requirements.
Why don't you show us what you tried and what didn't work about it. Show us your thought process, how you approached this.
Oh, and what is your source? Don't want to make any assumptions there.
Why don't you show us what you tried and what didn't work about it. Show us your thought process, how you approached this.
Oh, and what is your source? Don't want to make any assumptions there.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Thanks Kashyap, you are awesome..I was able to do this with Stage Variables though...
StageVar1 - Converted all '.' and numbers to tilda(~) using convert function.
StageVar2 - Used index function to get the position of 6 consecutive tildas in StageVar1 Variable. Used that position pull the substring from actual field.
I made sure that tildas~ were not there as part of the actual data.
Thanks Craig, your question helped me raise a question with source.
Thanks & Regards
Koti
StageVar1 - Converted all '.' and numbers to tilda(~) using convert function.
StageVar2 - Used index function to get the position of 6 consecutive tildas in StageVar1 Variable. Used that position pull the substring from actual field.
I made sure that tildas~ were not there as part of the actual data.
Thanks Craig, your question helped me raise a question with source.
Thanks & Regards
Koti
You are welcome.
Just FYI for anyone using External Filter stage. In previous versions, External Filter stage has an issue in processing of special characters. See technote. If this issue still occurs, then push the arguments to a file either as described in technote or by using "-f" option of sed.
Just FYI for anyone using External Filter stage. In previous versions, External Filter stage has an issue in processing of special characters. See technote. If this issue still occurs, then push the arguments to a file either as described in technote or by using "-f" option of sed.
Just for future reference ... Basic Transformer's MATCHFIELD function could also have been used for pattern matching and string extraction.