Page 1 of 1

Converting Special Characters in Datastage

Posted: Tue Jan 26, 2010 4:45 pm
by brownnt
I am reading a file that has a multi-value field. The data in the multi-value field is separated with a special character( looks like a box). When we used oracle 9 it would convert the box to a question mark. Then in the transformer, I would convert the ? to blanks. Now that we have converted to oracle 10g. It does not convert it to a question mark instead it leaves it as a box. How can I convert this special character (box) to blank within datastage.

Thanks!!

Posted: Tue Jan 26, 2010 4:51 pm
by chulett
You need to determine exactly what this character is. Get yourself a hex editor (many free ones out there) and use that to find the hex value, then we'll see just how special it really is. :wink:

Re: Converting Special Characters in Datastage

Posted: Wed Jan 27, 2010 9:11 am
by PhilHibbs
First of all, I would make sure the NLS mapping is right for the source connection - is it UTF8? ISO8859 (extended ASCII)?

Then, I would write a Transformer that breaks up the input value into individual characters and writes out the UNIPOS() value of each one. You can then use this value to do something like CONVERT( UNICHAR(xxx), "?", input.value ), where xxx is the UNIPOS value, and ? is the character that you want to convert it to. Or you could use the UNICHAR(xxx) in a FIELD function call to split on it.

Update: Here is the code for a Routine to convert a string to unicode values

Code: Select all

FUNCTION UniConv(TextIn)
  TextLen= Len(TextIn)
  NewText = ""

  For LC = 1 To TextLen
    NewText = NewText : "," : UniSeq(TextIn[LC,1])
  Next

  Ans= NewText[2,9999]

Posted: Wed Jan 27, 2010 3:11 pm
by ray.wurlod
May I suggest a couple of minor improvements? One is to use the special assignment operator (:=) which is more and more efficient the longer the string gets. The other is to use a fixed width format for each Unicode codepoint.

Code: Select all

FUNCTION UniConv(TextIn) 
  TextLen= Len(TextIn) 
  NewText = "" 

  For LC = 1 To TextLen 
    NewText :=  Fmt(UniSeq(TextIn[LC,1]), "R %5")
  Next 

  Ans= NewText

(An option not implemented is to convert each to hex. That could be achieved by applying an "MX" conversion.)

Code: Select all

FUNCTION UniConvHex(TextIn) 
  TextLen= Len(TextIn) 
  NewText = "" 

  For LC = 1 To TextLen 
    NewText :=  Fmt(Oconv(UniSeq(TextIn[LC,1]), "MX"), "R %4")
  Next 

  Ans= NewText