Converting Special Characters in Datastage

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
brownnt
Premium Member
Premium Member
Posts: 21
Joined: Tue Feb 03, 2009 6:07 pm

Converting Special Characters in Datastage

Post by brownnt »

I am reading a file that has a multi-value field. The data in the multi-value field is separated with a special character( looks like a box). When we used oracle 9 it would convert the box to a question mark. Then in the transformer, I would convert the ? to blanks. Now that we have converted to oracle 10g. It does not convert it to a question mark instead it leaves it as a box. How can I convert this special character (box) to blank within datastage.

Thanks!!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You need to determine exactly what this character is. Get yourself a hex editor (many free ones out there) and use that to find the hex value, then we'll see just how special it really is. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Re: Converting Special Characters in Datastage

Post by PhilHibbs »

First of all, I would make sure the NLS mapping is right for the source connection - is it UTF8? ISO8859 (extended ASCII)?

Then, I would write a Transformer that breaks up the input value into individual characters and writes out the UNIPOS() value of each one. You can then use this value to do something like CONVERT( UNICHAR(xxx), "?", input.value ), where xxx is the UNIPOS value, and ? is the character that you want to convert it to. Or you could use the UNICHAR(xxx) in a FIELD function call to split on it.

Update: Here is the code for a Routine to convert a string to unicode values

Code: Select all

FUNCTION UniConv(TextIn)
  TextLen= Len(TextIn)
  NewText = ""

  For LC = 1 To TextLen
    NewText = NewText : "," : UniSeq(TextIn[LC,1])
  Next

  Ans= NewText[2,9999]
Phil Hibbs | Capgemini
Technical Consultant
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

May I suggest a couple of minor improvements? One is to use the special assignment operator (:=) which is more and more efficient the longer the string gets. The other is to use a fixed width format for each Unicode codepoint.

Code: Select all

FUNCTION UniConv(TextIn) 
  TextLen= Len(TextIn) 
  NewText = "" 

  For LC = 1 To TextLen 
    NewText :=  Fmt(UniSeq(TextIn[LC,1]), "R %5")
  Next 

  Ans= NewText

(An option not implemented is to convert each to hex. That could be achieved by applying an "MX" conversion.)

Code: Select all

FUNCTION UniConvHex(TextIn) 
  TextLen= Len(TextIn) 
  NewText = "" 

  For LC = 1 To TextLen 
    NewText :=  Fmt(Oconv(UniSeq(TextIn[LC,1]), "MX"), "R %4")
  Next 

  Ans= NewText
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply