Need to delete last 2 records in a flat file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rodre
Premium Member
Premium Member
Posts: 218
Joined: Wed Mar 01, 2006 1:28 pm
Location: Tennessee

Need to delete last 2 records in a flat file

Post by rodre »

I have fixed flat files, that have two bad records at the end of the files.
The last record it has, I think is a carriage return, but not sure, looks like:
The other record has "99999999007563" plus 300 blank spaces. The last digits of this number is the record count.
All other records are only 200 characters long.

Loading this file in server was no problem but we are migrating to parallel and UNIX and it aborts because of this two records.

I did a search but none of the suggestions seem to work well. I was able to use the sed '/999999990/d' <file name> to eliminate one of the records but I am concerned that might find a record in the file with 999999990 and delete that record as well. Also I did not know how to eliminate the , it would not compile.

Your help is much appreciate it! :)
mail2hfz
Premium Member
Premium Member
Posts: 92
Joined: Thu Nov 16, 2006 8:51 am

Post by mail2hfz »

If you want to get rid of the last 2 records then delete those line in before job subroutine or in a invocation shell (sed 'N;$!P;$!D;$d' <<inp file>> > newfile). If you are planning to handle those records then provide the exact criteria with samples
swapnilverma
Participant
Posts: 135
Joined: Tue Aug 14, 2007 4:27 am
Location: Mumbai

Post by swapnilverma »

with unix you can do this ...

reclen=200

cat filename |while read line do

{


count=`wc -c $line`

if (( $count == $reclen)) ; than

echo $line >> good_data_file.txt

else

echo $line >> bad_data_file.txt

}

done


this will eliminate all the bad rows ... regard less their position in file...


alternatively if only last two records have to be removed

count= `wc -l filename`

count=`expr count -2`

head -$count filename >> good_data_filename
Thanks
Swapnil

"Whenever you find whole world against you just turn around and Lead the world"
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

As mentioned earlier in the posts,sed is a single line 'simple' command which does the 'quick job'

Regards
Sreeni
chowdhury99
Participant
Posts: 43
Joined: Thu May 29, 2008 8:41 pm

Post by chowdhury99 »

You may use awk command 'length ==200' to separate all records with length 200.

Thanks
rodre
Premium Member
Premium Member
Posts: 218
Joined: Wed Mar 01, 2006 1:28 pm
Location: Tennessee

Post by rodre »

Thank you for all the suggestions. I am new working with UNIX and I run the code bellow but is failing. Can anyone let me knonw what am I doing wrong? :roll:

Code: Select all

reclen=199 
cat /apps01/Int/Work/SAS/Files/seqSourceFile.txt |while read line do 
{ 
count=`wc -c $line` 
if (($count == $reclen)) ; then 
echo $line >> /apps01/Int/Work/SAS/Files/seqSourceFile1.txt 
else 
echo $line >> /apps01/Inte/Work/SAS/Files/seqSourceFile2.txt 
} 
done  
is giving me this error:

/apps01/Int/Work/SAS/Scripts/SASdeleteBadRows.sh: line 17: syntax error near unexpected token `}'
/apps01/Int/Work/SAS/Scripts/SASVABdeleteBadRows.sh: line 17: `}'
dsadm@axxxxxxxxx-0[/apps01/dsadm](!)$
I appreciate your help!
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Post by SwathiCh »

Remove braces {} from the code and run it.

We no need to give c-style of begining and ending for loops in UNIX.Do is begining of the loop and done meand end of the loop.
--
Swathi Ch
Post Reply