Substring problem

gateleys · Post by **gateleys** » Fri Apr 28, 2006 12:02 pm

I am using DS v7.5.1A on Windows. How can we extract a just the filename from the input string that can be of the forms-
FileName "E:\\Ascential\\DataStage\\Projects\\TestProj\\DataDir\\myfilename"
or
FileName "E:\\Ascential\\DataStage\\Projects\\TestProj\\DataDir\\myfile.txt"
or
FileName "myfilename3"
or
FileName "file.text"
or
FileName "C:\\DataFiles\\somefile.txt"

NOTE: The string FileName is part of the input string, and the quotes appear in the input.
I tried to use the FIELD function, but the occurrence of the dot (.) or the (\\) symbol may or may not exist. Besides, the number of instances of (\\) varies. I could not use the LEFT or RIGHT functions for similar functions. SUBSTR() cannot be used since the the name of files can vary. EREPLACE function will not get me the filename.

My goal is to replace each string with-
FileName "#FileDir#\\myfilename" (for first example)

FileName "#FileDir#\\myfile.txt" (for second example), and so on.

Can somebody please suggest a simple way?

Thanks,
gateleys

kcbland · Post by **kcbland** » Fri Apr 28, 2006 12:05 pm

Count the slashes and the filename is after the last slash.

Code: Select all

SlashCount=COUNT(yourstring,"\")
If SlashCount > 0 Then
   LastWord=FIELD(yourstring, "\", SlashCount+1)
End Else
   LastWord=yourstring
End
If RIGHT(LastWord,1)='"' Then LastWord=LastWord[1,LEN(LastWord)-1]
If LEFT(LastWord,1)='"' Then LastWord=LastWord[2,LEN(LastWord)-1]

chulett · Post by **chulett** » Fri Apr 28, 2006 12:06 pm

Count the number of separators and then dynamically use that to extract the 'last' FIELD. I've done this many times with a single separator character, having '\\' as one should be possible. At worst case, count the number of them and divide by 2.

gateleys · Post by **gateleys** » Fri Apr 28, 2006 12:25 pm

Thanks Kenneth and Craig,
Your method looks logical. I am trying it out. Will seek your help if I face any problem.

gateleys

gateleys · Post by **gateleys** » Fri Apr 28, 2006 1:10 pm

Got it to work, except that the third statement in Kenneth's code needs to add 1 to the SlashCount:

Code: Select all

LastWord=FIELD(yourstring, "\", SlashCount + 1)

Kudos to you.

gateleys

ray.wurlod · Post by **ray.wurlod** » Fri Apr 28, 2006 4:10 pm

Look in the DataStage BASIC manual for two subroutines that work with pathnames. They are called, if my memory serves, !GET.PATHNAME and !MAKE.PATHNAME

By using these to deconstruct and construct pathnames you will have a completely portable and idiot-proof mechanism.

kcbland · Post by **kcbland** » Fri Apr 28, 2006 4:18 pm

I like using basename, but since it's a Windoze platform that option is out.

sbass1 · Post by **sbass1** » Tue Mar 31, 2009 4:49 pm

I was searching for methods to deal with Pathname / Basename processing, which lead me to this thread.

kcbland's post (second one) gave me the idea to write a generic routine to process strings "backwards" (right to left). From this you can derive Pathname and Basename.

Ray's post described the !GET.PATHNAME subroutine, but I was wondering if there is use for a generic routine to process strings right to left given an arbitrary delimiter. Plus I didn't read the entire thread before I was off and running writing the routine

Doh!

Regardless, I'd love some feedback on whether this routine is useful, can be improved, or just reinvents a perfectly round wheel.

If you're interested in having a look and providing feedback, see http://docs.google.com/Doc?docid=dcdxxj ... sh5g&hl=en for a DSX file. The functionality should be obvious from the routine test cases.

Thanks,
Scott

ray.wurlod · Post by **ray.wurlod** » Tue Mar 31, 2009 5:01 pm

I already have such a routine, called FinalDelimitedSubstring. You get it as a bonus when you download my date routines

sbass1 · Post by **sbass1** » Tue Mar 31, 2009 5:45 pm

ray.wurlod wrote:I already have such a routine, called FinalDelimitedSubstring. You get it as a bonus when you download my date routines

Thanks Ray. I appreciate you sharing these useful routines.

I think my particular routine is a bit more generic, but I may be recreating functionality already in the product. I thus encourage further feedback.

ray.wurlod · Post by **ray.wurlod** » Tue Mar 31, 2009 6:31 pm

1. Any routine you write for others to use should explicitly test for and handle (usually by returning null, maybe throwing a warning) any unassigned or null input arguments - use UnAssigned() and IsNull() functions to test these.

2. There is no need for a DEFFUN declaration since this routine does not make a recursive call to itself.

3. To conform with regular DataStage string management, you should allow but truncate decimal numeric arguments. 2.7 becomes 2, -3.3 becomes -3, and so on.

Code: Select all

 If arg Matches "1N0N'.'0N" : @VM "'-'1N0N'.'0N" Then arg = Field(arg, ".", 1, 1)

4. To conform with regular DataStage string management, any reference to a non-existent position should return a zero-length string, rather than null. For example, "if Occurrence has slipped back past beginning of string" return empty string.

5. Beware that "0N" matches zero or more numeric characters. A safer test for integer is "1N0N" : @VM : "'-'1N0N". Your pattern matches "".

6. Documentation on General tab is extremely sparse. It should contain usage instructions, ideally with examples, and perhaps reference to the Test grid and the fact that ad hoc testing should be at the end of the grid.

sbass1 · Post by **sbass1** » Tue Mar 31, 2009 8:09 pm

1. I've coded it such that NO arguments are required - whether that's good is another discussion, since that deviates from the requirements for Field:

A) If pString is empty, return empty. If pString is null, return null
B) If pDelimiter is empty, return pString.
C) If pOccurence is empty, default to 1.
D) If pNumSubstr is empty, default to 1.

What do you think the functionality should be in these scenarios?

2. Noted, thanks. Online code updated.

3. Hmmm, is that really "regular DataStage string management", to silently alter the format of required arguments? Other languages I've used would throw an error if the argument was not in the required format, to avoid unintended but silent results. I do specifically check for non-integer arguments, and throw an error if it's not an integer.

4. Thanks, code changed to return an empty string. Null is still returned if there is an error. Online code updated.

5. See #1 above.

6. Thanks. I'll update the long description once the design is finalized based on feedback.

I appreciate the comments!

ray.wurlod · Post by **ray.wurlod** » Tue Mar 31, 2009 8:19 pm

1. Empty is not the same as null or unassigned. What you have is OK for empty, but the function should return null if any of its arguments is null or in an unassigned state.

3. Open your DataStage BASIC manual and search for "truncated". You'll find that this is documented behaviour for many functions.

DSXchange

Substring problem

Substring problem

Feedback on Scott's routine