Removing trailing special character when source is DB2 data

Chris Zampogna · Post by **Chris Zampogna** » Wed Mar 31, 2010 10:36 am

My source field is a varchar(4) in a DB2 UDB on Linux table. When I view the data through an ODBC Enterprise stage I can see a trailing square. What is this character and how do I remove it using a transformer?

Thanks in advance for your help,
Chris

chulett · Post by **chulett** » Wed Mar 31, 2010 10:39 am

You'd have to identify the character first. Any way to get the hex/decimal equivalent of this "square"? Oracle has a DUMP() function but no clue what DB2 has. That or write it to a flat file and then check it with a hex editor / viewer.

ray.wurlod · Post by **ray.wurlod** » Wed Mar 31, 2010 4:07 pm

Welcome aboard.

What kind of data source is it? In particular is it text file? (That is, is the DSN a reference to a driver for text files?)

Chris Zampogna · Post by **Chris Zampogna** » Thu Apr 01, 2010 7:45 pm

ray.wurlod wrote:Welcome aboard.

What kind of data source is it? In particular is it text file? (That is, is the DSN a reference to a driver for text files?)

Thanks, Ray. The original data source is a text file. I see it in DataStage when I view the data after it has been loaded into the DB2 table. These trailing characters prevent proper joining on this field in lookup stages, which is why I VERY am interested in learning to manage them correctly.

Thanks for your help. It's much appreciated!

ray.wurlod · Post by **ray.wurlod** » Thu Apr 01, 2010 9:29 pm

My guess is that the trailing character is either a UNIX line feed (Char(10)) or the first character of a DOS line terminator (Char(13)). Either way, these should have been cleaned off before being loaded into the DB2 table - this is the correct way to handle this problem.

Once you've found out what the character is, you can append it to the input that's missing it so you can have a lookup that works, and then remove the character downstream, but this approach is really a bandaid - the best approach is to prevent the character getting into DB2 in the first place.

Sairam · Post by **Sairam** » Fri Apr 02, 2010 9:45 am

Hi
As Ray mentioned identify what is the trailing character, most of the time it would be char(10). (Install a freeware Notepad++ which has the ability to display special characters)

Use this perl one liner to substitite that trailing character with blank
Ex :I have used \000 ( null ) with blank

perl -p -i -e "s/\000/ /g" < Absolute Path of the text file >

\000 is the hexdecimal for Null

You can call this perl one liner either as
1) Before Job Routine

Code: Select all

Exec SH   perl -p -i -e "s/\000/ /g" < Absolute Path of the text  file >

2) In a Job sequence Call the Execute Command Activity

Code: Select all

Command : perl -p -i -e "s/\000/ /g" 
Parameters:
< Absolute Path of the text  file >

Thanks

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Fri Apr 02, 2010 9:51 am

Is your job the one loading the text file into DB2? If so, then the problem might be easily fixable. Go to format specifications for the sequential file and change the record delimiter / delimiter string. If it is set to delimiter UNIX, delete that and add delimiter string DOS. Or do the reverse.