Substring problem
Moderators: chulett, rschirm, roy
Substring problem
I am using DS v7.5.1A on Windows. How can we extract a just the filename from the input string that can be of the forms-
FileName "E:\\Ascential\\DataStage\\Projects\\TestProj\\DataDir\\myfilename"
or
FileName "E:\\Ascential\\DataStage\\Projects\\TestProj\\DataDir\\myfile.txt"
or
FileName "myfilename3"
or
FileName "file.text"
or
FileName "C:\\DataFiles\\somefile.txt"
NOTE: The string FileName is part of the input string, and the quotes appear in the input.
I tried to use the FIELD function, but the occurrence of the dot (.) or the (\\) symbol may or may not exist. Besides, the number of instances of (\\) varies. I could not use the LEFT or RIGHT functions for similar functions. SUBSTR() cannot be used since the the name of files can vary. EREPLACE function will not get me the filename.
My goal is to replace each string with-
FileName "#FileDir#\\myfilename" (for first example)
FileName "#FileDir#\\myfile.txt" (for second example), and so on.
Can somebody please suggest a simple way?
Thanks,
gateleys
FileName "E:\\Ascential\\DataStage\\Projects\\TestProj\\DataDir\\myfilename"
or
FileName "E:\\Ascential\\DataStage\\Projects\\TestProj\\DataDir\\myfile.txt"
or
FileName "myfilename3"
or
FileName "file.text"
or
FileName "C:\\DataFiles\\somefile.txt"
NOTE: The string FileName is part of the input string, and the quotes appear in the input.
I tried to use the FIELD function, but the occurrence of the dot (.) or the (\\) symbol may or may not exist. Besides, the number of instances of (\\) varies. I could not use the LEFT or RIGHT functions for similar functions. SUBSTR() cannot be used since the the name of files can vary. EREPLACE function will not get me the filename.
My goal is to replace each string with-
FileName "#FileDir#\\myfilename" (for first example)
FileName "#FileDir#\\myfile.txt" (for second example), and so on.
Can somebody please suggest a simple way?
Thanks,
gateleys
Count the slashes and the filename is after the last slash.
Code: Select all
SlashCount=COUNT(yourstring,"\")
If SlashCount > 0 Then
LastWord=FIELD(yourstring, "\", SlashCount+1)
End Else
LastWord=yourstring
End
If RIGHT(LastWord,1)='"' Then LastWord=LastWord[1,LEN(LastWord)-1]
If LEFT(LastWord,1)='"' Then LastWord=LastWord[2,LEN(LastWord)-1]
Last edited by kcbland on Fri Apr 28, 2006 1:22 pm, edited 3 times in total.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Count the number of separators and then dynamically use that to extract the 'last' FIELD. I've done this many times with a single separator character, having '\\' as one should be possible. At worst case, count the number of them and divide by 2. ![Wink :wink:](./images/smilies/icon_wink.gif)
![Wink :wink:](./images/smilies/icon_wink.gif)
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Got it to work, except that the third statement in Kenneth's code needs to add 1 to the SlashCount:
Kudos to you.
gateleys
Code: Select all
LastWord=FIELD(yourstring, "\", SlashCount + 1)
gateleys
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Look in the DataStage BASIC manual for two subroutines that work with pathnames. They are called, if my memory serves, !GET.PATHNAME and !MAKE.PATHNAME
By using these to deconstruct and construct pathnames you will have a completely portable and idiot-proof mechanism.
By using these to deconstruct and construct pathnames you will have a completely portable and idiot-proof mechanism.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I was searching for methods to deal with Pathname / Basename processing, which lead me to this thread.
kcbland's post (second one) gave me the idea to write a generic routine to process strings "backwards" (right to left). From this you can derive Pathname and Basename.
Ray's post described the !GET.PATHNAME subroutine, but I was wondering if there is use for a generic routine to process strings right to left given an arbitrary delimiter. Plus I didn't read the entire thread before I was off and running writing the routine
Doh!
Regardless, I'd love some feedback on whether this routine is useful, can be improved, or just reinvents a perfectly round wheel.
If you're interested in having a look and providing feedback, see http://docs.google.com/Doc?docid=dcdxxj ... sh5g&hl=en for a DSX file. The functionality should be obvious from the routine test cases.
Thanks,
Scott
kcbland's post (second one) gave me the idea to write a generic routine to process strings "backwards" (right to left). From this you can derive Pathname and Basename.
Ray's post described the !GET.PATHNAME subroutine, but I was wondering if there is use for a generic routine to process strings right to left given an arbitrary delimiter. Plus I didn't read the entire thread before I was off and running writing the routine
![Rolling Eyes :roll:](./images/smilies/icon_rolleyes.gif)
Regardless, I'd love some feedback on whether this routine is useful, can be improved, or just reinvents a perfectly round wheel.
If you're interested in having a look and providing feedback, see http://docs.google.com/Doc?docid=dcdxxj ... sh5g&hl=en for a DSX file. The functionality should be obvious from the routine test cases.
Thanks,
Scott
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I already have such a routine, called FinalDelimitedSubstring. You get it as a bonus when you download my date routines
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks Ray. I appreciate you sharing these useful routines.ray.wurlod wrote:I already have such a routine, called FinalDelimitedSubstring. You get it as a bonus when you download my date routines
I think my particular routine is a bit more generic, but I may be recreating functionality already in the product. I thus encourage further feedback.
Last edited by sbass1 on Tue Mar 31, 2009 8:11 pm, edited 1 time in total.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Feedback on Scott's routine
1. Any routine you write for others to use should explicitly test for and handle (usually by returning null, maybe throwing a warning) any unassigned or null input arguments - use UnAssigned() and IsNull() functions to test these.
2. There is no need for a DEFFUN declaration since this routine does not make a recursive call to itself.
3. To conform with regular DataStage string management, you should allow but truncate decimal numeric arguments. 2.7 becomes 2, -3.3 becomes -3, and so on.
4. To conform with regular DataStage string management, any reference to a non-existent position should return a zero-length string, rather than null. For example, "if Occurrence has slipped back past beginning of string" return empty string.
5. Beware that "0N" matches zero or more numeric characters. A safer test for integer is "1N0N" : @VM : "'-'1N0N". Your pattern matches "".
6. Documentation on General tab is extremely sparse. It should contain usage instructions, ideally with examples, and perhaps reference to the Test grid and the fact that ad hoc testing should be at the end of the grid.
2. There is no need for a DEFFUN declaration since this routine does not make a recursive call to itself.
3. To conform with regular DataStage string management, you should allow but truncate decimal numeric arguments. 2.7 becomes 2, -3.3 becomes -3, and so on.
Code: Select all
If arg Matches "1N0N'.'0N" : @VM "'-'1N0N'.'0N" Then arg = Field(arg, ".", 1, 1)
5. Beware that "0N" matches zero or more numeric characters. A safer test for integer is "1N0N" : @VM : "'-'1N0N". Your pattern matches "".
6. Documentation on General tab is extremely sparse. It should contain usage instructions, ideally with examples, and perhaps reference to the Test grid and the fact that ad hoc testing should be at the end of the grid.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
1. I've coded it such that NO arguments are required - whether that's good is another discussion, since that deviates from the requirements for Field:
A) If pString is empty, return empty. If pString is null, return null
B) If pDelimiter is empty, return pString.
C) If pOccurence is empty, default to 1.
D) If pNumSubstr is empty, default to 1.
What do you think the functionality should be in these scenarios?
2. Noted, thanks. Online code updated.
3. Hmmm, is that really "regular DataStage string management", to silently alter the format of required arguments? Other languages I've used would throw an error if the argument was not in the required format, to avoid unintended but silent results. I do specifically check for non-integer arguments, and throw an error if it's not an integer.
4. Thanks, code changed to return an empty string. Null is still returned if there is an error. Online code updated.
5. See #1 above.
6. Thanks. I'll update the long description once the design is finalized based on feedback.
I appreciate the comments!
A) If pString is empty, return empty. If pString is null, return null
B) If pDelimiter is empty, return pString.
C) If pOccurence is empty, default to 1.
D) If pNumSubstr is empty, default to 1.
What do you think the functionality should be in these scenarios?
2. Noted, thanks. Online code updated.
3. Hmmm, is that really "regular DataStage string management", to silently alter the format of required arguments? Other languages I've used would throw an error if the argument was not in the required format, to avoid unintended but silent results. I do specifically check for non-integer arguments, and throw an error if it's not an integer.
4. Thanks, code changed to return an empty string. Null is still returned if there is an error. Online code updated.
5. See #1 above.
6. Thanks. I'll update the long description once the design is finalized based on feedback.
I appreciate the comments!
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
1. Empty is not the same as null or unassigned. What you have is OK for empty, but the function should return null if any of its arguments is null or in an unassigned state.
3. Open your DataStage BASIC manual and search for "truncated". You'll find that this is documented behaviour for many functions.
3. Open your DataStage BASIC manual and search for "truncated". You'll find that this is documented behaviour for many functions.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.