Page 1 of 2

Extract data from a Database Field

Posted: Thu Mar 04, 2010 12:52 am
by Dsnew
Hi all,

I have a requirement to extract information from a database field which can have sample values as below

Code: Select all

Austin+Texas;Blue+Sky;air-baloon+ride;Laptop+Pad;Grass+Green;Deep+;
air-baloon+ride;Austin+Texas;Blue+Sky;Laptop+Pad;Grass+Green;Deep+;
Table+Chair;Blue+Sky;air-baloon+ride;Laptop+Pad;Grass+Green;Deep+;Austin+Texas;
Blue+Sky;air-baloon+ride;Austin+Texas;Laptop+Pad;Grass+Green;Deep+;
Laptop+Pad;Grass+Green;Deep+;Austin+Texas;Blue+Sky;air-baloon+ride;Cool+Cat;
Basically the logic is that the related data (Paired) is seperated by semi colon
The pair is seperated by a "+" sign
Sometimes the second part of the so called Paired data could be missing.
The order of the pairs is not fixed.


I was envisioning either a Oracle function or Datastage Routine, which can search in the string for the first part and give me the second part of the pair (First part could be passed as an input parameter)
Please guide

Posted: Thu Mar 04, 2010 1:20 am
by ray.wurlod
Consider a server job with a server routine. DataStage BASIC has a wonderful richness of string-handling functions, because everything it does it does on strings.

Code: Select all

FUNCTION ReturnSecondOfPair(aString, aSearchString)

* Convert string into dynamic array.
String = Convert(";", @FM, aString)

* Find search string within string
Find aSearchString In String Setting FMC,VMC,SMC
Then
   Ans = Field(String<FMC>, "+", 2, 1)
End
Else
   Ans = ""
End

RETURN(Ans)
Error handling (such as null argument handling) has been omitted for clarity.

Posted: Thu Mar 04, 2010 1:35 am
by Dsnew
Ray,

Thank you for the quick response.

Code: Select all

TEST #1
*******
 
Arg1 = Austin+Texas;Blue+Sky;air-baloon+ride;Laptop+Pad;Grass+Green;Deep+; 
Arg2 = Blue
 
Test completed.
 

Result = 
The expected value is Sky, but I get a blank result. Appreciate your valuble inputs on this.

Posted: Thu Mar 04, 2010 2:22 am
by vkhandel
The logic could be implemented in a transformer stage using stage variables as explained below -

Suppose the input column name is "A" storing the entire string, i.e., "Austin+Texas;Blue+Sky;air-baloon+ride;Laptop+Pad;Grass+Green;Deep+; "

Stage variable: SVarIndexString
Derivation: Index(A,'Laptop',1)
Here "Laptop" is the Arg2 (passed as parameter)

Stage variable: SVarNewString
Derivation: A[SVarIndexString,Len(A) - SVarIndexString]

Stage variable: SVarIndexPairSeparator
Derivation: Index(SVarNewString,'+',1)

Stage variable: SVarIndexValueSeparator
Derivation: Index(SVarNewString,';',1)

And finally in the Derivation Tab, we can fetch the "other part" of the string by using the derivation -
OtherPart = SVarNewString[SVarIndexPairSeparator+1,SVarIndexValueSeparator - SVarIndexPairSeparator]

The output would be "Pad".

Posted: Thu Mar 04, 2010 2:34 am
by ray.wurlod
Dsnew wrote:Ray,

Thank you for the quick response.

Code: Select all

TEST #1
*******
 
Arg1 = Austin+Texas;Blue+Sky;air-baloon+ride;Laptop+Pad;Grass+Green;Deep+; 
Arg2 = Blue
 
Test completed.
 

Result = 
The expected value is Sky, but I get a blank result. Appreciate your valuble inputs on this.
Please confirm that you used 2 as the third argument of the Field() function.

Posted: Thu Mar 04, 2010 2:42 am
by ray.wurlod
My error. The Find statement should be FindStr.

I just tested it and it works OK.

Obviously there will be some cases where it won't work, where the search string occurs earlier in the overall string than where it's the first element. But none of your examples had that situation.

The solution provided by vkhandel suffers from the identical limitation.

Posted: Thu Mar 04, 2010 12:09 pm
by Dsnew
Ray,
I used your code (latest). I get similar results :(

Code: Select all

TEST #1
*******
 
Arg1 = Austin+Texas;Blue+Sky;air-baloon+ride;Laptop+Pad;Grass+Green;Deep+;
Arg2 = Blue
 
Test completed.
 

Result = 
vkhandel, Thanks for you code, I have not yet implemented it, will let you know how it goes.

Posted: Thu Mar 04, 2010 3:09 pm
by ray.wurlod
My test results are as follows.
aString: "Austin+Texas;Blue+Sky;air-baloon+ride;Laptop+Pad;Grass+Green;Deep+;"
aSearchString: "Blue"
Ans: "Sky"

Here is the routine:

Code: Select all

FUNCTION ReturnSecondOfPair(aString, aSearchString) 

* Convert string into dynamic array. 
String = Convert(";", @FM, aString) 

* Find search string within string 
FindStr aSearchString In String Setting FMC,VMC,SMC 
Then 
   Ans = Field(String<FMC>, "+", 2, 1) 
End 
Else 
   Ans = "" 
End 

RETURN(Ans)

Posted: Thu Mar 04, 2010 4:36 pm
by Dsnew
It works fine now! Not sure what went wrong with me last time.
Thanks you so much Ray!!!
It worked for all scanario for me, can you please let me know the limitation you said this had, so I can be aware of it.

Posted: Thu Mar 04, 2010 4:51 pm
by Kryt0n
Dsnew wrote:It worked for all scanario for me, can you please let me know the limitation you said this had, so I can be aware of it.
What he means is if you have your search string as a sub-string elsewhere in your name/value list and that substring appears prior to that actual search string e.g.

Stair+1;air+2;

With search string of "air" would get you the 1 and not the 2.

You can for the most cut this off by searching for "<name>+" (this will not work for above example) or almost entirely by searching for ";<name>+" but will need separate handling for the first name/value pair

Posted: Thu Mar 04, 2010 5:02 pm
by chulett
Not really separate handling needed, just stick a ';' on the front of the string before searching.

Posted: Thu Mar 04, 2010 5:14 pm
by ray.wurlod
Umm... not so, because the ";" characters have become field marks, and FindStr and related statements don't include these as data values. However, a trailing "+" would do the trick.

Posted: Thu Mar 04, 2010 8:54 pm
by chulett
Ah... missed that little nuance. Still, bracketing your search string with the internal delimiters can be a handy little trick for stopping substring matches. Generally. :wink:

Posted: Sun Mar 07, 2010 11:56 am
by Dsnew
Kryt0n wrote:What he means is if you have your search string as a sub-string elsewhere in your name/value list and that substring appears prior to that actual search string e.g.

Stair+1;air+2;

With search string of "air" would get you the 1 and not the 2.

You can for the most cut this off by searching for "<name>+" (this will not work for above example) or almost entirely by searching for ";<name>+" but will need separate handling for the first name/value pair
ray.wurlod wrote:However, a trailing "+" would do the trick. ...
How do we handle the above screnario?
Like already mentioned above adding a trailing "+" gives 1 and not 2.
Any help appreciated.

Posted: Sun Mar 07, 2010 12:09 pm
by ray.wurlod
Here is the routine:

Code: Select all

FUNCTION ReturnSecondOfPair(aString, aSearchString) 

* Convert string into dynamic array. 
String = Convert(";", @FM, aString) 
SearchString = aSearchString : "+"

* Find search string within string 
FindStr SearchString In String Setting FMC,VMC,SMC 
Then 
   Ans = Field(String<FMC>, "+", 2, 1) 
End 
Else 
   Ans = "" 
End 

RETURN(Ans)