UNIX files vs DOS files

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

UNIX files vs DOS files

Post by admin »

Hi All,

We have a situation we are loading the same layout file sourced from UNIX and NT. Now UNIX files have a newline that consists only of a Linefeed whereas the NT files have a newline that is a Carriage Return and a Linefeed. In the Sequential File Stage I can specify whether this file is a Unix or DOS file but Id like to be able to process these through the same job.

Now, if I write a bit of job control to OpenSeq the file, and ReadSeq a few lines, I find that Universe treats the two files as identical (ie the length of a line is the same in both).

How can I get DataStage to be as file friendly?
Or
How can I determine in Universe Basic if the file is of Unix or DOS format?

thanks,
Gavin

*******************Confidentiality and Privilege Notice*******************

This email is intended only to be read or used by the addressee. It is confidential and may contain legally privileged information. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone, and you should destroy this message and kindly notify the sender by reply email. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you. Visit Qantas online at http://www.qantas.com
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Where do you have DataStage running? On Unix or NT?

How do you get the files to this platform? If one uses FTP to move the files between DOS and Unix, it can convert them on the way for you, si that they will always have the line terminator that corresponds to the target platform.

I know this isnt what you asked, but it might be an alternative, depending on your situation.

-----Original Message-----
From: Gavin GCO10 Cooke [SMTP:GCOOKE@QANTAS.COM.AU]
Sent: Saturday, June 09, 2001 5:29 PM
To: informix-datastage@oliver.com
Subject: UNIX files vs DOS files

Hi All,

We have a situation we are loading the same layout file sourced from UNIX and NT. Now UNIX files have a newline that consists only of a Linefeed whereas the NT files have a newline that is a Carriage Return and a Linefeed. In the Sequential File Stage I can specify whether this file is a Unix or DOS file but Id like to be able to process these through the same job.

Now, if I write a bit of job control to OpenSeq the file, and ReadSeq a few lines, I find that Universe treats the two files as identical (ie the length of a line is the same in both).

How can I get DataStage to be as file friendly?
Or
How can I determine in Universe Basic if the file is of Unix or DOS format?

thanks,
Gavin

*******************Confidentiality and Privilege Notice*******************

This email is intended only to be read or used by the addressee. It is confidential and may contain legally privileged information. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone, and you should destroy this message and kindly notify the sender by reply email. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you. Visit Qantas online at http://www.qantas.com


*************************************************************************
This e-mail and any files transmitted with it may be confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in
error, please notify the sender by return e-mail, and delete this e-mail from your in-box. Do not copy it to anybody else

*************************************************************************
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

We have DS 4.02 running on NT.
The files are 150-200MB in size so we gzip the files and FTP them in binary mode which saves us substantial time. Binary mode doubles FTP throughput over ethernet and gzip also adds a CDC on the files so that you can be sure that the whole file was FTPd.

I actually have a satisfactory work around where in a piece of DS job control, I read in the file line-by-line, write it out line-by-line which converts all files (Unix or DOS) to the local DS server format (in this case NT). This only takes 30-60 seconds for the process, plus a few seconds to rename the new file to the original.

I guess it would just be nice if we could do it in DataStage natively!

cheers,
Gavin
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

In the Sequential File stage type you can specify three different kinds of EOL behaviour; UNIX, DOS and none. However, this setting can not be parameterized, so theres no automatic shift possible (which is what I suspect you were hoping for). In programming (before/after routines, job control routines, etc.), you are - as you observed - using stock standard OpenSeq, ReadSeq, etc., from UniVerse, so it will be EOL-agnostic and you wont have to worry.

The only time you have to worry (in both UniVerse and DataStage) is if youre using ReadBlk where you have to look after it yourself. Luckily this is easy; the length of a line is given by databytes + (If System(91) Then 2 Else 1). This is about the only mechanism you have in BASIC to determine that convention is being used; search for the first Char(13) then test whether the next character is Char(10).
Locked