EXTRACT Numbers only

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sarjushah
Premium Member
Premium Member
Posts: 40
Joined: Thu May 12, 2005 3:59 pm

EXTRACT Numbers only

Post by sarjushah »

We have data on a Teradata table and it contains Numbers followed by some thing (seriously something.. not space..not tab ..not sure what).

Trim while selecting the data or Trim / StripWhitespace in datastage wont get rid of it.

It was loaded from mainframe.. so may be someone goofed up. Thats not my problem.

I am reading the data and I want to get rid of it.

On server jobs we could use the inbuilt DIGITS transform for it to get rid of such stuff.

Code for DIGITS transform "OCONV(%ARG1%,"MCN")"

I know we cannot use OCONV in parallel.

So what are my option on a parallel job.
vinnz
Participant
Posts: 92
Joined: Tue Feb 17, 2004 9:23 pm

Post by vinnz »

Please search the forum. You could use a BASIC transformer or alternatively use the convert function
sarjushah
Premium Member
Premium Member
Posts: 40
Joined: Thu May 12, 2005 3:59 pm

Post by sarjushah »

But I dont know what I am converting from. It looks like a space but it is not a space.

I tried TRIM on the database, and Datastage but it does not goaway.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use a hex editor to actually look at what's there. That will guide your future steps.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

You can look into writing your own C/C++ code. Look into the isprint() function. It checks if the character is printable or not. Search for pxEreplace() here in dsxchange and see how i am taking in a string and looping through each and every character.
Once you get the looping mastered pass it through the isprint(), if its true, keep it, if not skip it. Return the end result and that will give you string without unprintable characters.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Which isn't quite what the OP was asking for but which would probably have the same end result in this situation. You could also build another very similar function that checks each character for a numeric digit and only passes that out as the result, that would match what the Server DIGITS transform does.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sarjushah
Premium Member
Premium Member
Posts: 40
Joined: Thu May 12, 2005 3:59 pm

Post by sarjushah »

chulett - Thats exactly what I am looking for a new function which would do what the DIGITS does.

But I dont know how to go about it.

Writing a C program for this would be a too much to ask. There has to be a simpler way to check for numbers.

Ray. I am not sure how to read the data in Hex format in Teradata. But I dont want to know the hex values of what else is there. Instead I would still like to write a generic function that just does what the digits transform does. extract numbers from a string.

Thanks
Sarju Shah
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is a useful technique: it has been posted previously.

Code: Select all

Convert(InLink.TheField, Convert(InLink.TheField, "0123456789", ""), "")
The inner Convert() function returns a string of all non-numeric characters in the field; the outer Convert() function removes all of those leaving only numeric characters.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That's true, forgot about the dreaded "Double Convert". :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply