Read sequential file with ^M in record

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Druid_Elf
Participant
Posts: 32
Joined: Thu Aug 28, 2008 5:53 am

Read sequential file with ^M in record

Post by Druid_Elf »

Hi all,
I have a request in which we need to process a file deliverd by business (CSV).
I've configured the sequential file stage to "Record delimiter string: DOS" , and the section Field Defaults: "Delimiter= |"

The problem is that the column "process" can have a carriage return, and they also want to see this carriage return in the database to which we need to upload the file.

Example of the file record:
"<NULL>"|"<NULL>"|"False"|"12:00 u"|"<NULL>"|"Access"|"Tabel: Pakketten Sales per channel^M
Excel: C+ migraties in input sheet IDTV"|"<NULL>"|"


In vi i can see that there is also a ^M present in the column itself. I've tried different settings but the record is always send to the reject link with following warning:
Sequential_File_0,0: Input buffer overrun at field "Process"

My question is: Is the sequential file stage able to read records where a column in the file contains a carriage return ? And if so, what settings do you need to use in the stage ?

Thanks for any info you can provide.
dr.murthy
Participant
Posts: 224
Joined: Sun Dec 07, 2008 8:47 am
Location: delhi

Post by dr.murthy »

hi,

you can remove that additional characteres at sequential file level by using the vi command like

:%s/[ctrlkey+v and ctrl-key+M]//g
but actual command would be :%s/^V^M//g
D.N .MURTHY
Druid_Elf
Participant
Posts: 32
Joined: Thu Aug 28, 2008 5:53 am

Post by Druid_Elf »

But doing that would remove the carriage return from the string.
But business still wants to see the carriage return in database.
So
Tabel: Pakketten Sales per channel^M
Excel: C+ migraties in input sheet IDTV"|"<


needs to be stored in the database as
Tabel: Pakketten Sales per channel
Excel: C+ migraties in input sheet IDTV
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

When you open a file in unix which has been edited in windows 'Control M' character appears.

Its the 'next line' character.

Use dos2unix command on the file in unix and it will remove these characters like a magic wand

Regards
Sreeni
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Druid_Elf wrote:Business still wants to see the carriage return in database.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

I guess you can overcome this issue when you define that particular field as LOB in table... Anyways, DB experts can give you more inputs..
When you have Control-M in LOB fields, DS handles them automatically.

Other way is to use Server job ;)
Kandy
_________________
Try and Try again…You will succeed atlast!!
Druid_Elf
Participant
Posts: 32
Joined: Thu Aug 28, 2008 5:53 am

Post by Druid_Elf »

I tried with the server job and my problem is solved :)
Thanks kandyshandy.

In the column definition you can specify the setting "contains terminators", if you set this value to yes, it will keep the CR in the column and the record does not get discarded.

Thanks all for your advice. Appreciate it :D
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Just want to make sure you're aware why that worked. It's because what you actually have there is a CR/LF pair. You see the CR in the data and the record break from the LF terminator. It's the LF that the option allowed you to ignore.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply