Extracting a Part from the given String
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 107
- Joined: Wed Sep 29, 2004 10:15 am
Extracting a Part from the given String
Hi All
I had strucked up with a specefic Problem and need some guidence to resolve it.
I had a String like "BOLT, W/NUT, P/N P422 AAAA,P.,&" and i had to extract a part of string like "P/N P422".
The Problem is that the Part of the string that has to be extract can exist any where in the given string .
My Actual String : BOLT, W/NUT, P/N P422 AAAA,P
My Output : P/N P422
Need Some Kind of Functions in Datastage that will fetch the part of string from the given string.
Your Kind response is Appreciated.
Thanks
RK
I had strucked up with a specefic Problem and need some guidence to resolve it.
I had a String like "BOLT, W/NUT, P/N P422 AAAA,P.,&" and i had to extract a part of string like "P/N P422".
The Problem is that the Part of the string that has to be extract can exist any where in the given string .
My Actual String : BOLT, W/NUT, P/N P422 AAAA,P
My Output : P/N P422
Need Some Kind of Functions in Datastage that will fetch the part of string from the given string.
Your Kind response is Appreciated.
Thanks
RK
-
- Participant
- Posts: 107
- Joined: Wed Sep 29, 2004 10:15 am
DataStage has more string functions than you will normally use and doing what you want is quite easy - if you specify the rules. If you are using a "," (comma) to separate component parts of a string then the FIELD() function will let you do that. But you have also stated that your substring can occur in any field. Are you looking for that one particular substring or one that starts with "P/N"?
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
RK,
If you need to identify/extract more than just this one type of string then a QualityStage standardisation job would allow you to classify and extract tokens from the string regardless of their input format.
In this case, the classification file of your selected ruleset would define identifying tokens like 'P/N', 'BOLT', etc.
JMcK.
If you need to identify/extract more than just this one type of string then a QualityStage standardisation job would allow you to classify and extract tokens from the string regardless of their input format.
In this case, the classification file of your selected ruleset would define identifying tokens like 'P/N', 'BOLT', etc.
JMcK.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
How do you define "P/N P422" is the required the output. Is it a fixed value for all the set of input srtings. If so you just need to find whether "P/N P422" exist or not. In that case, you can DCount(). If you have another table from which you need to pick up the value you can do lookup and use the function. Or you have start position and end position, you can extract.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Participant
- Posts: 107
- Joined: Wed Sep 29, 2004 10:15 am
Hi All
Thanks for your Great Response
1. I had to extract "P/N xxxxxxx" value that has been any where in the Given String
2. I had to Split the Given String into two Parts
Like for Example
Description = "BOLT, W/NUT, P/N P422 WISS,J.,&"
From this I had to extract 2 Strings
a) Part = "P/N P422"
b) Restofdata = "BOLT, W/NUT, WISS,J.,&"
Now I hope my requirement is clear
Pls advise how to write a Routine to solve this
Means Steps i had to Follow
Thanks in Advance
RK
Thanks for your Great Response
1. I had to extract "P/N xxxxxxx" value that has been any where in the Given String
2. I had to Split the Given String into two Parts
Like for Example
Description = "BOLT, W/NUT, P/N P422 WISS,J.,&"
From this I had to extract 2 Strings
a) Part = "P/N P422"
b) Restofdata = "BOLT, W/NUT, WISS,J.,&"
Now I hope my requirement is clear
Pls advise how to write a Routine to solve this
Means Steps i had to Follow
Thanks in Advance
RK
As mentioned you can use Index(), which will give you the starting position of "P/N", you can get the substring of required string from the main string by Input.Field[Index(Input.field,"P/N"),9].
And for the second one, you can get the substring from the first char to the point where you find "P/N" and concatinate with the string which starts from 9 char after the the existence of "P/N" to the end.
And for the second one, you can get the substring from the first char to the point where you find "P/N" and concatinate with the string which starts from 9 char after the the existence of "P/N" to the end.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Participant
- Posts: 107
- Joined: Wed Sep 29, 2004 10:15 am
Hi All
Thanks for your response
But the problem is i am not sure the length of P/N . It may range from 0 to 20 digits .
As the String Contains the Data as "P/N" as last characters of whole Description then the Length of "P/N" is 0. And this P/N can Lies anywhere in the Given Description.
What are the Possible Ways to Attack this Problem.
Thanks in Advance
RK
Thanks for your response
But the problem is i am not sure the length of P/N . It may range from 0 to 20 digits .
As the String Contains the Data as "P/N" as last characters of whole Description then the Length of "P/N" is 0. And this P/N can Lies anywhere in the Given Description.
What are the Possible Ways to Attack this Problem.
Thanks in Advance
RK
Spend a little time thinking about it.
What marks the end of this part number? You'll need to find the position of 'P/N' in the field. Four positions from there the actual part number starts. It ends... where? The next space you encounter? The end of the string? A comma? All of the above?
You're going to need to write a fairly sophisticated string parsing routine to do this correctly 100% of the time. You've got all the tools you need in the various DataStage string functions. They're all noted in the Help. All that's left is for you to put it all together.
And rather than just start coding, write it out in words first. Once you think you've got it all nailed down, then turn it into code.
![Confused :?](./images/smilies/icon_confused.gif)
What marks the end of this part number? You'll need to find the position of 'P/N' in the field. Four positions from there the actual part number starts. It ends... where? The next space you encounter? The end of the string? A comma? All of the above?
You're going to need to write a fairly sophisticated string parsing routine to do this correctly 100% of the time. You've got all the tools you need in the various DataStage string functions. They're all noted in the Help. All that's left is for you to put it all together.
And rather than just start coding, write it out in words first. Once you think you've got it all nailed down, then turn it into code.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
My choice would be MatchField().
The MatchField() function here returns everything between "P/N " (with trailing space) and the next space character. It will return "" if "P/N " does not occur in the string, or if the part number is at the very end (no trailing space). You can get around the latter with a pair of MatchField functions.
Etiquette Note
Pushing back after less than 30 minutes because no-one had replied was, frankly, offensive. This is an all-volunteer site, and some of us are in different time zones to you. And we even think about our responses, which can take time.
Code: Select all
"P/N " : MatchField(InLink.Description,"0X'P/N '0X' '0X",3)
Etiquette Note
Pushing back after less than 30 minutes because no-one had replied was, frankly, offensive. This is an all-volunteer site, and some of us are in different time zones to you. And we even think about our responses, which can take time.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Well, a few suggestions in writing the routine.
Read in the entire description and take four positions at a time and check for "P/N ".
If it doesnt match, store it in one variable which will be called your 'Restofdata' and if it matches, store it another variable untill you hit ,
Something like
I am sure this is not the best logic, but off the top of my head, this is what i could come up with.
'Part' will contain the part you want and 'RestOfData' is the rest of the data. You can concatenate the two together with either a pipe or some other character that will not showup in your Desc data and return that back.
Read in the entire description and take four positions at a time and check for "P/N ".
If it doesnt match, store it in one variable which will be called your 'Restofdata' and if it matches, store it another variable untill you hit ,
Something like
Code: Select all
Desc=TRIM(Arg1)
RestOfData=''
Part=''
EndForLoop=LEN(Desc)
FOR n = 1 TO EndForLoop
IF Desc[n,4] = 'P/N '
THEN
Part=Field(Desc[n,EndForLoop-n+1],1,2)
RestOfData=Desc[1,n-1]:Desc[n+Len(Part),EndForLoop-Len(Part)]
END
NEXT n
'Part' will contain the part you want and 'RestOfData' is the rest of the data. You can concatenate the two together with either a pipe or some other character that will not showup in your Desc data and return that back.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
I revisited my routine above and found some minor logical errors in it. I fixed it. You can use it now. Try to test it for different scenarios. I, currently do not have access to Datastage and hence cannot test it myself.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 107
- Joined: Wed Sep 29, 2004 10:15 am
-
- Charter Member
- Posts: 166
- Joined: Wed Mar 16, 2005 6:52 am
- Location: Mumbai, India
Hi rkdatastage .
Try this one.
You may have to modify the If statement to suit your data.
Try this one.
You may have to modify the If statement to suit your data.
Code: Select all
InString=Arg1
StringLength=len(InString)
* Get the Starting Location for Part Number
PnStart=INDEX(InString,"P/N",1)
* Init End of Part Number for testing.
PnEnd=StringLength
* To test for Endo of the Part Number. Part number begins at the 5th position from the "P" of "P/N"
FOR PnEndTester=(PnStart+5) TO StringLength
* This is the check to find the end of Part Number. Change this to suit your part number Valid Characters
IF NUM(InString[PnEndTester,1]) AND InString[PnEndTester,1]<>" " THEN
* Left in for Debugging
PRINT "PNENDTESTERIf ":PnEndTester: ":" :InString[PnEndTester,1]
END
ELSE
* This char is not a valid part number character
PnEnd=PnEndTester-1
* Left in for Debugging
PRINT "PNENDTESTERElse ":PnEndTester: ":" :InString[PnEndTester,1]: ":" :PnEnd
* We have the end position we want. no need to continue.
GOTO ExitFor
END
NEXT PnEndTester
ExitFor:
* Get the Part number String
PNString=InString[PnStart,PnEnd-PnStart+1]
* Get the Remaining String
RemString=InString[1,PnStart-1]:InString[StringLength-PnEnd]
Ans=PNString:@FM:RemString
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>