Substring problem

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Substring problem

Post by gateleys »

I am using DS v7.5.1A on Windows. How can we extract a just the filename from the input string that can be of the forms-
FileName "E:\\Ascential\\DataStage\\Projects\\TestProj\\DataDir\\myfilename"
or
FileName "E:\\Ascential\\DataStage\\Projects\\TestProj\\DataDir\\myfile.txt"
or
FileName "myfilename3"
or
FileName "file.text"
or
FileName "C:\\DataFiles\\somefile.txt"

NOTE: The string FileName is part of the input string, and the quotes appear in the input.
I tried to use the FIELD function, but the occurrence of the dot (.) or the (\\) symbol may or may not exist. Besides, the number of instances of (\\) varies. I could not use the LEFT or RIGHT functions for similar functions. SUBSTR() cannot be used since the the name of files can vary. EREPLACE function will not get me the filename.

My goal is to replace each string with-
FileName "#FileDir#\\myfilename" (for first example)

FileName "#FileDir#\\myfile.txt" (for second example), and so on.

Can somebody please suggest a simple way?

Thanks,
gateleys
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Count the slashes and the filename is after the last slash.

Code: Select all

SlashCount=COUNT(yourstring,"\")
If SlashCount > 0 Then
   LastWord=FIELD(yourstring, "\", SlashCount+1)
End Else
   LastWord=yourstring
End
If RIGHT(LastWord,1)='"' Then LastWord=LastWord[1,LEN(LastWord)-1]
If LEFT(LastWord,1)='"' Then LastWord=LastWord[2,LEN(LastWord)-1]

Last edited by kcbland on Fri Apr 28, 2006 1:22 pm, edited 3 times in total.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Count the number of separators and then dynamically use that to extract the 'last' FIELD. I've done this many times with a single separator character, having '\\' as one should be possible. At worst case, count the number of them and divide by 2. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Thanks Kenneth and Craig,
Your method looks logical. I am trying it out. Will seek your help if I face any problem. :)

gateleys
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Got it to work, except that the third statement in Kenneth's code needs to add 1 to the SlashCount:

Code: Select all

LastWord=FIELD(yourstring, "\", SlashCount + 1)
Kudos to you.

gateleys
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Look in the DataStage BASIC manual for two subroutines that work with pathnames. They are called, if my memory serves, !GET.PATHNAME and !MAKE.PATHNAME

By using these to deconstruct and construct pathnames you will have a completely portable and idiot-proof mechanism.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I like using basename, but since it's a Windoze platform that option is out.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

I was searching for methods to deal with Pathname / Basename processing, which lead me to this thread.

kcbland's post (second one) gave me the idea to write a generic routine to process strings "backwards" (right to left). From this you can derive Pathname and Basename.

Ray's post described the !GET.PATHNAME subroutine, but I was wondering if there is use for a generic routine to process strings right to left given an arbitrary delimiter. Plus I didn't read the entire thread before I was off and running writing the routine :roll: Doh!

Regardless, I'd love some feedback on whether this routine is useful, can be improved, or just reinvents a perfectly round wheel.

If you're interested in having a look and providing feedback, see http://docs.google.com/Doc?docid=dcdxxj ... sh5g&hl=en for a DSX file. The functionality should be obvious from the routine test cases.

Thanks,
Scott
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I already have such a routine, called FinalDelimitedSubstring. You get it as a bonus when you download my date routines
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

ray.wurlod wrote:I already have such a routine, called FinalDelimitedSubstring. You get it as a bonus when you download my date routines
Thanks Ray. I appreciate you sharing these useful routines.

I think my particular routine is a bit more generic, but I may be recreating functionality already in the product. I thus encourage further feedback.
Last edited by sbass1 on Tue Mar 31, 2009 8:11 pm, edited 1 time in total.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Feedback on Scott's routine

Post by ray.wurlod »

1. Any routine you write for others to use should explicitly test for and handle (usually by returning null, maybe throwing a warning) any unassigned or null input arguments - use UnAssigned() and IsNull() functions to test these.

2. There is no need for a DEFFUN declaration since this routine does not make a recursive call to itself.

3. To conform with regular DataStage string management, you should allow but truncate decimal numeric arguments. 2.7 becomes 2, -3.3 becomes -3, and so on.

Code: Select all

 If arg Matches "1N0N'.'0N" : @VM "'-'1N0N'.'0N" Then arg = Field(arg, ".", 1, 1)
4. To conform with regular DataStage string management, any reference to a non-existent position should return a zero-length string, rather than null. For example, "if Occurrence has slipped back past beginning of string" return empty string.

5. Beware that "0N" matches zero or more numeric characters. A safer test for integer is "1N0N" : @VM : "'-'1N0N". Your pattern matches "".

6. Documentation on General tab is extremely sparse. It should contain usage instructions, ideally with examples, and perhaps reference to the Test grid and the fact that ad hoc testing should be at the end of the grid.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

1. I've coded it such that NO arguments are required - whether that's good is another discussion, since that deviates from the requirements for Field:

A) If pString is empty, return empty. If pString is null, return null
B) If pDelimiter is empty, return pString.
C) If pOccurence is empty, default to 1.
D) If pNumSubstr is empty, default to 1.

What do you think the functionality should be in these scenarios?

2. Noted, thanks. Online code updated.

3. Hmmm, is that really "regular DataStage string management", to silently alter the format of required arguments? Other languages I've used would throw an error if the argument was not in the required format, to avoid unintended but silent results. I do specifically check for non-integer arguments, and throw an error if it's not an integer.

4. Thanks, code changed to return an empty string. Null is still returned if there is an error. Online code updated.

5. See #1 above.

6. Thanks. I'll update the long description once the design is finalized based on feedback.

I appreciate the comments!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1. Empty is not the same as null or unassigned. What you have is OK for empty, but the function should return null if any of its arguments is null or in an unassigned state.

3. Open your DataStage BASIC manual and search for "truncated". You'll find that this is documented behaviour for many functions.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply