Page 1 of 1

Extracting the substring from a Char Field

Posted: Mon Dec 17, 2007 9:24 am
by Raamc
Hi All,

I have a column called ADDR4 of type VARCHAR(40).It will hold the 4th Address line of an account. I am getting data as POSTCODE concatenated with STATE in this field.
I need to extract the State from ADDR4.

Here is the sample data
ADDR4 ABCD M22 1 TQ
ADDR4 AB CD BD 22 1 SW

In all the cases i need to extract ABCD.

Please suggest me how to do this in the transformer.

Thanks
Raam

Posted: Mon Dec 17, 2007 9:31 am
by ArndW
Do you want to extract the first 4 characters from the line, but ignoring spaces?

Posted: Mon Dec 17, 2007 10:06 am
by Raamc
I want to extract the STATE name from this column.but i dont know the lenght of the state data. It may variey from record to record. I want to extract the data before POST CODE...(State will be followed by POST CODE)

We can identify POSTCODE by numbers in it.

So my logic is, i want to extract the data before numbers.

Posted: Mon Dec 17, 2007 10:27 am
by Sreedhar
The INDEX function with the DataStage should help you in this case. Please go through the INDEX FUNCTION.


Theoretically it is some thing like this.

Use index function to get the occurrence of numeric data (it returns the location value where you have found the numeric data) having got that do a substring to get the STATE Name.

Hope this helps....


Regards,

SM<

Posted: Mon Dec 17, 2007 11:32 am
by ArndW
I'm not sure how much you are expecting to do here - if your data is coming in different order then you need to write (or buy) software to do this complex chore. QualityStage will do it for you.

The INDEX() function, as mentioned earlier, can let you parse elements from your string. For example, INDEX(In.DataColumn,' ',2) will give you "M22" and "CD" from your sample data.

Posted: Mon Dec 17, 2007 2:02 pm
by ray.wurlod
A fully general solution using this approach would require an unwieldy expression involving ten Index() functions, one for each numeric character, and a lot of Or operators.

You would probably be better off either using QualityStage (as mentioned) which has inbuilt name and address parsing capability, or creating your own parallel routine.

Another possibility, especially if your data volumes are not huge, is to make use of the pattern matching capabilities of the DataStage BASIC language (Matches operator and MatchField() function) using a BASIC Transformer stage.

Posted: Tue Dec 18, 2007 5:15 pm
by Raamc
I have been solved this problem by using the Excel logic and datastage job.

I used the datastage job to bring the data into Uni format and written it into XLS file. In excel i written a formula to compare the strings and sorted out the problem.


Thank you for all your valuable suggessions and support.

Cheers,
Raamc :D

Posted: Tue Dec 18, 2007 5:15 pm
by Raamc
I have been solved this problem by using the Excel logic and datastage job.

I used the datastage job to bring the data into Uni format and written it into XLS file. In excel i written a formula to compare the strings and sorted out the problem.


Thank you for all your valuable suggessions and support.

Cheers,
Raamc :D