I have the following requirement.
There is a table containing some patterns like
_ _ _ A B C
A _ _ * * *
A B C * * *
Now, from the input stream, lets say we get an input field value as
X Y Z A B C.
We have to do pattern matching for this record using the following rules.
1) We check if the length of the input string matches with the length of any of the patterns in the pattern table. We select the first pattern from the patterns table for which the length matches.
2) Then we check that the corresponding characters in the input string and the pattern string match. Here an '_' or '*' in the pattern string mean that any character at the corresponding position in the input string is acceptable.
For ex - input string 'X Y Z A B C' will match with the first pattern in the patterns table '_ _ _ A B C'
3) Now, once a match is found in the patterns table, we implement the following logic to derive the output field.
a) For every '_' found in the pattern, we replicate the same character at that position as found in the input string. " For every '*' found in the matched pattern, we output a '*' only in the output string.
For example,
1)
Input : - 'X Y Z A B C'
Matched Pattern : - ' _ _ _ A B C'
Output : - 'X Y Z A B C'
2)
Input : - 'A Y Z R S T'
Matched Pattern : - 'A _ _ * * * '
Output : - 'A Y Z * * *'
3)
Input : - 'A B C X Y Z'
Matched Pattern : - 'A B C * * *'
Output : - 'A B C * * *'
The patterns are stored in a patterns table in the database. While the input stream is read through a sequentail file.
Could you please suggest the best way to implement this logic in datastage? Are Routines, SQL Procedures or Unix scripting the best option or is there some other easy way?
Thanks a lot for your help.