Stripping header and trailer record from input files
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 72
- Joined: Tue Feb 05, 2008 4:38 am
Stripping header and trailer record from input files
Hi,
There are many topics out there which almost answer my question, but none which actually do. Hopefully someone willbe good enough to give me a hand.
I have a job that process a number of files on each run. the job currently starts with a sequential file read by file pattern. each file consists of a header record, record number 1, and a trailer record, the last record. All other records are data records. the data records are comma delimited.
When i run the job the read rejects the header and trailers as they do not match the meta data. this currently writes a number of warnings to the logs which if i am processing enough files blows the warning limit and aborts the job.
i can reject the header and trailer of course, but this still writes the warnings to the logs. is there some way of stopping the warnings when writing to the logs?
Alternatively i can preprocess the file to remove the header and trailer. unfortunatley as the read is by file pattern the file filter option is not available and si i cannot do sed command to remove the first and last record.
i also tried to run the data through a filter stage, that lead to further data rejects in the intial read.
Could anyone save me from going bald by giving me a pointer or two please?
Thanks, as always.
John Fitz
There are many topics out there which almost answer my question, but none which actually do. Hopefully someone willbe good enough to give me a hand.
I have a job that process a number of files on each run. the job currently starts with a sequential file read by file pattern. each file consists of a header record, record number 1, and a trailer record, the last record. All other records are data records. the data records are comma delimited.
When i run the job the read rejects the header and trailers as they do not match the meta data. this currently writes a number of warnings to the logs which if i am processing enough files blows the warning limit and aborts the job.
i can reject the header and trailer of course, but this still writes the warnings to the logs. is there some way of stopping the warnings when writing to the logs?
Alternatively i can preprocess the file to remove the header and trailer. unfortunatley as the read is by file pattern the file filter option is not available and si i cannot do sed command to remove the first and last record.
i also tried to run the data through a filter stage, that lead to further data rejects in the intial read.
Could anyone save me from going bald by giving me a pointer or two please?
Thanks, as always.
John Fitz
If your number of files being read through filepattern is equal to number of nodes used then you can use rownumber and filename options in sequential file stage and there by filter header and trailer in transformer stage .Otherwise its better to write a unix script and call from before job routine .
Nag
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Gain some information ahead of processing, particularly the line count in the file (wc -l command). You can use that in a Transformer stage (executing in sequential mode or in a server job) to filter on @INROWNUM.
Code: Select all
@INROWNUM <> 1 And @INROWNUM <> paramLineCount
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 72
- Joined: Tue Feb 05, 2008 4:38 am
Morning,
Thanks to all for the responses.
To answer a few of queries raised:
1. the header is marked with HR in the first 2 characters & the trailer is marked with TR
2. i have tried to read it as a single field but the read is rejecting all the data records
Is it possible to turn off all the warning messages when writing to a reject file
Regards,
John FItz
Thanks to all for the responses.
To answer a few of queries raised:
1. the header is marked with HR in the first 2 characters & the trailer is marked with TR
2. i have tried to read it as a single field but the read is rejecting all the data records
Is it possible to turn off all the warning messages when writing to a reject file
Regards,
John FItz
-
- Participant
- Posts: 72
- Joined: Tue Feb 05, 2008 4:38 am
Hi,
i created a seperate job to preprocess the file. read each record in as 1 field. pass the data through a transformer to isolate the first 2 characters, then use this new field in a filter stage to identify headers and trailers (might be processing multiple input files) finally write data records to new sequential file.
I then modified the original job to read the new sequential file insstead of the multiple input files.
this works, but i cannot but think that this is fairly inefficient, what with having to create a new file and subsequently delete the same file as part of the process.
Any thoughts on this work around would be gratefully recieved.
Regards,
John Fitz
i created a seperate job to preprocess the file. read each record in as 1 field. pass the data through a transformer to isolate the first 2 characters, then use this new field in a filter stage to identify headers and trailers (might be processing multiple input files) finally write data records to new sequential file.
I then modified the original job to read the new sequential file insstead of the multiple input files.
this works, but i cannot but think that this is fairly inefficient, what with having to create a new file and subsequently delete the same file as part of the process.
Any thoughts on this work around would be gratefully recieved.
Regards,
John Fitz
John Fitz
-
- Participant
- Posts: 72
- Joined: Tue Feb 05, 2008 4:38 am
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
D U P L I C A T E
Last edited by Sainath.Srinivasan on Fri Jun 05, 2009 5:36 am, edited 1 time in total.
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Code: Select all
egrep -v '^HR|^TR' yourFileName
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: