Non English Characters in Fixed length Char field

chulett · Post by **chulett** » Fri Apr 12, 2013 9:10 am

OK.

The Reply to topic link is at the top and bottom of every page. When you are reading this look down a little bit.

And that's why I said to increase it to something larger like 100 and see what ends up in the field.

william.eller@ed.gov · Fri Apr 12, 2013 10:53 am

I checked with other teams at my installation - they've all had the same issue. They performed the research and tried multiple combinations of file types/mappings/extension to no avail. The work around was/is to use a "C" program to read each file, rewrite all recognizable/printable characters to one output file, all others to another thus stripping the non readable characters. I will use this approach

Thanks all ever so much

chulett · Post by **chulett** » Fri Apr 12, 2013 4:02 pm

You strip out the so-called "unreadable" characters? Throw away the client's data?

That would be a big no-no here. Are you sure you don't want to fix this instead?

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Sat Apr 13, 2013 10:15 am

Our experience is that stripping the characters out isn't the best solution. These characters are created by "non-US" keyboards that are being used by other countries to enter critical data (names, addresses, company names) that contain special characters with diacritical marks. Stripping out the characters just causes frustration as the users see missing characters in names and addresses and, assuming it is a typo, they go put the characters back in.

A better solution is to map the characters to various "anglicized" alternatives without the special diacritical marks. Almost all of them have alternatives that can be used that keep the spelling roughly the same. This "cleanses" the data while letting the users know it wasn't a typo - its just a limitation of the database.

With that said, we use a similar approach with either C or UNIX commands to process the stream of data and replace (instead of remove) the "bad" characters.

DSXchange

Non English Characters in Fixed length Char field

Re: How to reply and workaround