Header and footer identification in a sequential file

evee1 · Post by **evee1** » Thu May 26, 2011 12:10 am

I'm building a PX job that will processes a sequential file, containing a header, a footer and data records. I need to do reconciliation of one of the fields field in the file against a value in a footer.
In addition, the job is supposed to be generic, i.e. process files with different data format (but always having the same header and footer formats).

I was planning to read all records in using Sequential File stage and pass them through the transformer to identify header, footer and data records.

Then, in a case of the data records, I wanted to use Column Import stage to separate one input column into multiple output columns, according to the passed schema file and then pass the to the output DataSet with RPC (not sure yet whether it's possible).

I have a problem though with identification of the header and footer records. Header and footer records in all source files will start with H or T, but not all data records will have an indicator at the start of the record, and it potentially might happen that the first character of the first field will be H or T.
How could I deal with this issue?

One idea I've got is to extract the row count from the footer in a routine that is executed before this job (I'm doing it anyway for validation purposes), pass it to the job as a parameter and then use @INROWNUM to identify the first and the last record. But I think that in order for this to work I would have to restrict the transformer to run on one node only to make it run in a sequential mode. This is not ideal as I would like to take advantage of parallel processing since some data files can be very big.

But maybe someone has a better suggestion, that will be performance-friendly.

evee1 · Post by **evee1** » Thu May 26, 2011 12:22 am

I have just realized I could use the Row Number Column added in the Sequential File stage. Since I'm reading only one file at a time it will run in a sequential mode anyway.
I still need to pass the max numnber of data records, so still waiting for other great ideas

.

peddakkagari · Post by **peddakkagari** » Mon May 30, 2011 6:42 am

As you said use the Row Number Column in the Sequential File stage.
Then all your input records will be having Row_number
Connect Sequential File stage to transformer and use @INROWNUM to get the total count then connect the transformer to Filter stage there you put a condition like Row_number=1 or Row_number=inrownum_count
then it will return your first record and last record I hope.

Thanks,
Sreekanth