Removing CRLF from fixed width text files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
javier perez
Participant
Posts: 4
Joined: Tue Jun 28, 2005 5:03 pm
Location: Brisbane, Australia

Removing CRLF from fixed width text files

Post by javier perez »

I'm working with fixed width files containing call centre data, where the source app permits the use of the <enter> key to start a new line, within a single field.

The fixed width text file consequently, contains extraneous CRLF (DOS files) within the definition of a single field width. I can read the file with a sequential stage by adjusting the "Contains terminators" and "Incomplete column" settings for the relevant columns.

However, when I parse it from the sequential stage, the CRLF is counted as two bytes and consequently throws out the definition for the remaining columns (by two) in that row.

I'm open to suggestions as to how I might solve the problem ...
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Hello Javier,

the <CR><LF> terminator is in fact 2 bytes, so that shouldn't throw off your column width at all. So if you define your columns with the appropriate fixed widths you will be able to read them into a DataStage job with no problem.

You can then write a Transform stage derivation for the free-text field to do whatever you wish with these embedded <CR><LF>. I would replace them with some otherwise unused character that could still be interpreted as a line break by an application downstream. Let's take the "|" (pipe) character as the replacement - in that case your derivation could read EREPLACE(In.StringColumn,CHAR(10):CHAR(13),'|')
javier perez
Participant
Posts: 4
Joined: Tue Jun 28, 2005 5:03 pm
Location: Brisbane, Australia

Post by javier perez »

Thanks for the response ... you're right I didn't expect the behaviour I got either. I did initially replace 'CRLF' with a 'pipe', but it affected the column width.
I've now found a solution ... it seems so simple I must admit to being a little embarrassed.
Solution?
Define 2 columns in the initial parse - 1 column everything except the final character and 'real' CRLF and allowing line terminators, and the 2nd the remaining . I perform a transform of the first column, replacing the CRLF with an empty string.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you're using a Sequential File stage there may be an even easier answer. On the Columns grid scroll across to the right where there's a "contains line terminators" rule. The stage itself can handle this situation!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
There is no relevance for CRLF AKA Char(13):Char(10)
and the problem you have in the maner you think.
The CRLF is your file's row terminator.
The fact your having this problem is simply due to the fact your table definition is not correct for the file at hand.
From what you say it seems that your table definition is longer or shorter then it needs to be.

Go over the columns and find the miscounted length or missing column orvanything along those lines.

Did you by any chance forget that in fixed width files the display attribute of the table definition is what determins the field lengths and not the length attribute?

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Post Reply