I have an input string chacters in the following format :
ABCDEF
ABCDEFGHIJ
ABCDEFGHIJKL
And I have a lookup table that stores rules on how to parse out the string character like (blank space is my delimiter) :
XX XXXX XXXX XX
I would like my output to be something like (based on the rule found above) :
AB CDEF
AB CDEF GHIJ
AB CDEF GHIJ KL
Eventually, I would like to have two outputs. One output will store the whole string in the new format as above (for each row).
The second output will insert each parsed out string value into another table along with the order like :
AB , 1
CDEF, 2
GHIJ , 3
KL , 4
What kind of stage variables will allow me to accomplish the above transformations ? I know this must be a piece of cake for all DS experts
here!
Thanks!
How to parse string input into rows by delimiter
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 13
- Joined: Wed Apr 07, 2010 11:50 pm
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If you're prepared to use a BASIC Transformer stage this is a very easy task - convert each X into a sharp sign (#) and use that as the mask in a Fmt() function.
There's probably a solution using the parallel Transformer stage. What have you attempted thus far?
There's probably a solution using the parallel Transformer stage. What have you attempted thus far?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 13
- Joined: Wed Apr 07, 2010 11:50 pm
I have looked into using field function with no luck so far. But I am quite new to DataStage,
so it's quite possible that I may be doing something wrong
I am also trying to see if there is a locate function (similar to LOCATE function in DB2) to find the
delimiter position in a string to start but haven't come across anything that sounds like it yet.
Is there a list of stage variable functions in Transformation Stage anywhere that I can do some studies on ?
Thanks!
so it's quite possible that I may be doing something wrong
I am also trying to see if there is a locate function (similar to LOCATE function in DB2) to find the
delimiter position in a string to start but haven't come across anything that sounds like it yet.
Is there a list of stage variable functions in Transformation Stage anywhere that I can do some studies on ?
Thanks!
N.K.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The list of functions is Appendix B in Parallel Job Developer's Guide
Index() finds an occurrence of a substring in a string and Len() gives the number of characters in a string. Those will get you started.
Index() finds an occurrence of a substring in a string and Len() gives the number of characters in a string. Those will get you started.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 13
- Joined: Wed Apr 07, 2010 11:50 pm
Thank you guys. For the time being, I found a workaround and would like to share with everyone (not that it's a common problem!).
I ended up using stage variable FIELD function and save the length of each input in stage variable as follow :
StageVariable1 = Len(Field(Input.config," ",1,1))
StageVariable2 = Len(Field(Input.config," ",2,1))
StageVariable3 = Len(Field(Input.config," ",3,1))
StageVariable4 = Len(Field(Input.config," ",4,1))
Then for each row coming in, I would do a substring and paste each part back together like :
Input.data [1, StageVariable1] : " " : Input.data [StageVariable1 + 1, StageVariable2] : " " : Input.data [StageVariable1 + StageVariable2 + 1, StageVaiable3] : " " : Input.data [StageVariable1 + StageVariable2 + StageVariable3 + 1, StageVariable4]
It is kind of cumbersome, but it works ! :D
If anyone has any other alternatives, please do not hesitate to share!
I ended up using stage variable FIELD function and save the length of each input in stage variable as follow :
StageVariable1 = Len(Field(Input.config," ",1,1))
StageVariable2 = Len(Field(Input.config," ",2,1))
StageVariable3 = Len(Field(Input.config," ",3,1))
StageVariable4 = Len(Field(Input.config," ",4,1))
Then for each row coming in, I would do a substring and paste each part back together like :
Input.data [1, StageVariable1] : " " : Input.data [StageVariable1 + 1, StageVariable2] : " " : Input.data [StageVariable1 + StageVariable2 + 1, StageVaiable3] : " " : Input.data [StageVariable1 + StageVariable2 + StageVariable3 + 1, StageVariable4]
It is kind of cumbersome, but it works ! :D
If anyone has any other alternatives, please do not hesitate to share!
N.K.