remove blank lines from end of file

Jay · Post by **Jay** » Thu Oct 21, 2004 12:41 pm

Hi All,

We get flat files which have blank lines at the end of the file. This creates the "Mismatch in number of columns" error.

How to get rid of the lines? Before-job subroutine calling a shell script?
Can i get some ideas as to how to write the script?

Thanks in advance
J

martin · Post by **martin** » Thu Oct 21, 2004 1:39 pm

Invoke Tranform Stage And map All The Columns And Define Below Constraint To Eliminate Last New Line Or Blank Line.
But Make Sure You Do this On Key Column or Key Columns Which Doesnt Have Spaces.

DSLink.ColumnName > ' ' (Here Take One Key Column)
Or
(Col1:Col2:Col3:Col4) > ' ' (Here U Can Take Multiple Key Columns)

Good Luck

kduke · Post by **kduke** » Thu Oct 21, 2004 8:19 pm

The sed command is very fast and can delete blank lines. I can't remember exactly the syntax but it something like

Code: Select all

sed '1,%g/^$/d'

This says delete all lines which have the begining of a line ^ next to the end of a line $ and nothing in between.

martin · Post by **martin** » Fri Oct 22, 2004 8:19 am

Would You Please Educate Us On this Command ..How To use And Where To Use
Thanks

chulett · Post by **chulett** » Fri Oct 22, 2004 8:27 am

It's the "stream editor", available on any UNIX box. Try typing "man sed" at the command line and check it out!

kduke · Post by **kduke** » Fri Oct 22, 2004 9:34 am

Craig is correct. It is a simple UNIX command. It is very fast. It has very ugly syntax. You suggested that you had many blank lines and this will solve that problem. I think this has been covered so I would do a search on here and maybe Google.

You can ignore these errors in the sequential stage. You would also need a constraint in the output link. A couple of options available to you.

ray.wurlod · Post by **ray.wurlod** » Fri Oct 22, 2004 4:45 pm

Expanding slightly on what Kim said.

You can set the "missing columns" rules in the Sequential File stage (Columns grid - scroll right to find them) to have DataStage ignore the fact that there are missing columns, pad with null, etc.

Then you constrain the output so that these rows are not output, using an output constraint expression in the Transformer stage.

Jay · Post by **Jay** » Mon Oct 25, 2004 2:39 pm

Hi All,

Thanks for all the replies.

I have been playing around with the 'sed' command. i'll let you know, what i end up with.

Thanks
j

saprebv · Post by **saprebv** » Tue Oct 26, 2004 5:59 am

put constraint in first transformer
trim(field1)<>'' and trim(field2)<>'' and trim(field3)<>'' and .....

Jay · Post by **Jay** » Thu Oct 28, 2004 5:29 pm

Hey All

i ended up with this one-liner...

awk 'length>1' <file_name> | sed -e '/^^/d'

is this ok?

thanks all
j