Special characters in data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
raji33
Premium Member
Premium Member
Posts: 151
Joined: Thu Sep 23, 2010 9:21 pm
Location: NJ

Special characters in data

Post by raji33 »

Hi All,

I have source file as csv .

Code: Select all

id          empid     name
A123456     24567     ABC
UK45678               DEF
length of empid in source is 6 but in target file it is 20

in unix i am seeing as

Code: Select all

id          empid       name
A123456     247476^M    ABC
UK45678     ^M          DEF
In datastage iam seeing as 24567 with some special character. i tried to use trim function in datastage does'nt work. Any suggestions?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Control-M is a "Carriage Return".

Are your CSV file field delimited by spaces as you show? Are the text fields VarChar or fixed width? When you read your file, did you specify UNIX or WINDOWS line breaks?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ArndW wrote:Control-M is a "Carriage Return".
And a carriage return is a CHAR(13). FYI. However, you should still answer Arnd's questions.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The CSV file was created in Windows. Change the Record Delimiter property to Record Delimiter String and set its value to "DOS Style".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If the ^M characters were at the end of each line I would agree with you. In the middle? Not so much. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

I agree with Craig.

I've seen this behavior in data originated in a UI where the entry method was free-form text with no edit-validation when written to the database. Any keystroke or combination is accepted as a "character" in the usual GIGO sense.

I would question the method by which the csv was created in the first place. I don't agree that citing Windows is enough to determine what's happening. Data is data, says this old dog, and the new tricks they give users just creates data headaches.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
MarkB
Premium Member
Premium Member
Posts: 95
Joined: Fri Oct 27, 2006 9:13 am

Post by MarkB »

I also agree, though if you look at the example, it seems that EmpID is the field that has the embedded carriage returns. One would think that a field such as that would not be free-form text, but you never know.

I would suggest getting with the provider of the file and having them clean it up, or you are probably going to have a fun time trying to clean it up on your own so you can load it. Since you say you are on UNIX, and this is a csv file, you could always run the file through a tr command to remove the carriage returns first. Personally, if the file was documented as being a CSV file, I'd make every attempt to have the provider of the file give you the file in proper format, but that's just me.
rsomiset
Premium Member
Premium Member
Posts: 46
Joined: Fri Sep 21, 2007 7:16 pm

Post by rsomiset »

Please try the below for removing control-m and see if it helps.
1) dos2unix or
2) sed -e "s/^M//" filename > newfilename
To enter ^M, type CTRL-V, then CTRL-M. That is, hold down the CTRL key then press V and M in succession
--
Raj
Post Reply