Page 1 of 1

EXTRACT Numbers only

Posted: Thu Feb 24, 2011 3:45 pm
by sarjushah
We have data on a Teradata table and it contains Numbers followed by some thing (seriously something.. not space..not tab ..not sure what).

Trim while selecting the data or Trim / StripWhitespace in datastage wont get rid of it.

It was loaded from mainframe.. so may be someone goofed up. Thats not my problem.

I am reading the data and I want to get rid of it.

On server jobs we could use the inbuilt DIGITS transform for it to get rid of such stuff.

Code for DIGITS transform "OCONV(%ARG1%,"MCN")"

I know we cannot use OCONV in parallel.

So what are my option on a parallel job.

Posted: Thu Feb 24, 2011 4:32 pm
by vinnz
Please search the forum. You could use a BASIC transformer or alternatively use the convert function

Posted: Thu Feb 24, 2011 4:51 pm
by sarjushah
But I dont know what I am converting from. It looks like a space but it is not a space.

I tried TRIM on the database, and Datastage but it does not goaway.

Posted: Thu Feb 24, 2011 5:34 pm
by ray.wurlod
Use a hex editor to actually look at what's there. That will guide your future steps.

Posted: Fri Feb 25, 2011 8:44 am
by DSguru2B
You can look into writing your own C/C++ code. Look into the isprint() function. It checks if the character is printable or not. Search for pxEreplace() here in dsxchange and see how i am taking in a string and looping through each and every character.
Once you get the looping mastered pass it through the isprint(), if its true, keep it, if not skip it. Return the end result and that will give you string without unprintable characters.

Posted: Fri Feb 25, 2011 9:00 am
by chulett
Which isn't quite what the OP was asking for but which would probably have the same end result in this situation. You could also build another very similar function that checks each character for a numeric digit and only passes that out as the result, that would match what the Server DIGITS transform does.

Posted: Fri Feb 25, 2011 9:48 am
by sarjushah
chulett - Thats exactly what I am looking for a new function which would do what the DIGITS does.

But I dont know how to go about it.

Writing a C program for this would be a too much to ask. There has to be a simpler way to check for numbers.

Ray. I am not sure how to read the data in Hex format in Teradata. But I dont want to know the hex values of what else is there. Instead I would still like to write a generic function that just does what the digits transform does. extract numbers from a string.

Thanks
Sarju Shah

Posted: Fri Feb 25, 2011 2:07 pm
by ray.wurlod
This is a useful technique: it has been posted previously.

Code: Select all

Convert(InLink.TheField, Convert(InLink.TheField, "0123456789", ""), "")
The inner Convert() function returns a string of all non-numeric characters in the field; the outer Convert() function removes all of those leaving only numeric characters.

Posted: Fri Feb 25, 2011 3:53 pm
by chulett
That's true, forgot about the dreaded "Double Convert". :wink: