Page 1 of 1

Special characters in data

Posted: Wed Oct 02, 2013 9:59 am
by raji33
Hi All,

I have source file as csv .

Code: Select all

id          empid     name
A123456     24567     ABC
UK45678               DEF
length of empid in source is 6 but in target file it is 20

in unix i am seeing as

Code: Select all

id          empid       name
A123456     247476^M    ABC
UK45678     ^M          DEF
In datastage iam seeing as 24567 with some special character. i tried to use trim function in datastage does'nt work. Any suggestions?

Posted: Wed Oct 02, 2013 10:38 am
by ArndW
Control-M is a "Carriage Return".

Are your CSV file field delimited by spaces as you show? Are the text fields VarChar or fixed width? When you read your file, did you specify UNIX or WINDOWS line breaks?

Posted: Wed Oct 02, 2013 11:39 am
by chulett
ArndW wrote:Control-M is a "Carriage Return".
And a carriage return is a CHAR(13). FYI. However, you should still answer Arnd's questions.

Posted: Wed Oct 02, 2013 1:44 pm
by ray.wurlod
The CSV file was created in Windows. Change the Record Delimiter property to Record Delimiter String and set its value to "DOS Style".

Posted: Wed Oct 02, 2013 3:07 pm
by chulett
If the ^M characters were at the end of each line I would agree with you. In the middle? Not so much. :wink:

Posted: Wed Oct 02, 2013 3:33 pm
by FranklinE
I agree with Craig.

I've seen this behavior in data originated in a UI where the entry method was free-form text with no edit-validation when written to the database. Any keystroke or combination is accepted as a "character" in the usual GIGO sense.

I would question the method by which the csv was created in the first place. I don't agree that citing Windows is enough to determine what's happening. Data is data, says this old dog, and the new tricks they give users just creates data headaches.

Posted: Thu Oct 03, 2013 5:55 am
by MarkB
I also agree, though if you look at the example, it seems that EmpID is the field that has the embedded carriage returns. One would think that a field such as that would not be free-form text, but you never know.

I would suggest getting with the provider of the file and having them clean it up, or you are probably going to have a fun time trying to clean it up on your own so you can load it. Since you say you are on UNIX, and this is a csv file, you could always run the file through a tr command to remove the carriage returns first. Personally, if the file was documented as being a CSV file, I'd make every attempt to have the provider of the file give you the file in proper format, but that's just me.

Posted: Sun Oct 06, 2013 11:14 am
by rsomiset
Please try the below for removing control-m and see if it helps.
1) dos2unix or
2) sed -e "s/^M//" filename > newfilename
To enter ^M, type CTRL-V, then CTRL-M. That is, hold down the CTRL key then press V and M in succession