Stripping header and trailer record from input files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
wjfitzgerald
Participant
Posts: 72
Joined: Tue Feb 05, 2008 4:38 am

Stripping header and trailer record from input files

Post by wjfitzgerald »

Hi,

There are many topics out there which almost answer my question, but none which actually do. Hopefully someone willbe good enough to give me a hand.

I have a job that process a number of files on each run. the job currently starts with a sequential file read by file pattern. each file consists of a header record, record number 1, and a trailer record, the last record. All other records are data records. the data records are comma delimited.

When i run the job the read rejects the header and trailers as they do not match the meta data. this currently writes a number of warnings to the logs which if i am processing enough files blows the warning limit and aborts the job.

i can reject the header and trailer of course, but this still writes the warnings to the logs. is there some way of stopping the warnings when writing to the logs?

Alternatively i can preprocess the file to remove the header and trailer. unfortunatley as the read is by file pattern the file filter option is not available and si i cannot do sed command to remove the first and last record.

i also tried to run the data through a filter stage, that lead to further data rejects in the intial read.

Could anyone save me from going bald by giving me a pointer or two please?

Thanks, as always.

John Fitz
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

If your number of files being read through filepattern is equal to number of nodes used then you can use rownumber and filename options in sequential file stage and there by filter header and trailer in transformer stage .Otherwise its better to write a unix script and call from before job routine .
Nag
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Apart from first and last row in a file , are there any identifiers to find header and trailer in a file ??
Nag
mail2hfz
Premium Member
Premium Member
Posts: 92
Joined: Thu Nov 16, 2006 8:51 am

Post by mail2hfz »

May be you can read the whole record as a single field and filter the header/trailer records downstream.
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Yeah this is the reason i asked about header or trailer identifier .If there is no identifier other than first nd last row of a file we cannot filter out in a transformer .
mail2hfz wrote:May be you can read the whole record as a single field and filter the header/trailer records downstream.
Nag
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Gain some information ahead of processing, particularly the line count in the file (wc -l command). You can use that in a Transformer stage (executing in sequential mode or in a server job) to filter on @INROWNUM.

Code: Select all

@INROWNUM <> 1 And @INROWNUM <> paramLineCount
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
wjfitzgerald
Participant
Posts: 72
Joined: Tue Feb 05, 2008 4:38 am

Post by wjfitzgerald »

Morning,

Thanks to all for the responses.
To answer a few of queries raised:

1. the header is marked with HR in the first 2 characters & the trailer is marked with TR
2. i have tried to read it as a single field but the read is rejecting all the data records

Is it possible to turn off all the warning messages when writing to a reject file

Regards,

John FItz
arvind_ds
Participant
Posts: 428
Joined: Thu Aug 16, 2007 11:38 pm
Location: Manali

Post by arvind_ds »

You can set the limit as 999999 then select no limit and try.
Arvind
wjfitzgerald
Participant
Posts: 72
Joined: Tue Feb 05, 2008 4:38 am

Post by wjfitzgerald »

Hi,

i created a seperate job to preprocess the file. read each record in as 1 field. pass the data through a transformer to isolate the first 2 characters, then use this new field in a filter stage to identify headers and trailers (might be processing multiple input files) finally write data records to new sequential file.

I then modified the original job to read the new sequential file insstead of the multiple input files.

this works, but i cannot but think that this is fairly inefficient, what with having to create a new file and subsequently delete the same file as part of the process.

Any thoughts on this work around would be gratefully recieved.

Regards,

John Fitz
John Fitz
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Read all files with file pattern option and in transformer constraints specify if field[1,2]='HR' od field[1,2]='TR' then pass to onelink else output link .Now you have all the files within a single file without header and trailer .Hope this is what you are looking for
Nag
wjfitzgerald
Participant
Posts: 72
Joined: Tue Feb 05, 2008 4:38 am

Post by wjfitzgerald »

Thanks for coming back to me.
That is more efficient in that it would save me the use of the filter stage.

Thanks for the suggestion.

Regards,

John Fitz
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

D U P L I C A T E
Last edited by Sainath.Srinivasan on Fri Jun 05, 2009 5:36 am, edited 1 time in total.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Code: Select all

egrep -v '^HR|^TR' yourFileName
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

To read as a single line set both the delimiter and quote characters to "none".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply