Reading Sequential File with Header and Footer

alexnirmal · Post by **alexnirmal** » Thu Sep 02, 2004 4:28 am

how to read a seq file like this which contains header and footer
and how to omit this and just read the necessary columns
the header is name and no of ppl
and footer is ------
i just want the details col1 col2 col3 and col4

Name :uselss
no of ppl :89

col1 col2 col3 col4
1 2 3 4
5 6 7 8

--------------------------------------------

Name :us
no of ppl :8df9

col1 col2 col3 col4
11 21 31 14
51 61 71 18

--------------------------------------------

denzilsyb · Post by **denzilsyb** » Thu Sep 02, 2004 4:50 am

In your transform, from the sequential file add this constraint:

@INROWNUM > 2 (if header is first two lines)

this will omit the header. The footer you can also ignore by adding something to the constraint:

@INROWNUM > 1 and col1 <> '------'

seems a dirty way of doing it though.

you could put this in BASIC code if you wanted.

anupam · Post by **anupam** » Thu Sep 02, 2004 4:57 am

If the format is going to remain same, that is in each and every file you will be having header and footer like this

Code: Select all

Name :uselss 
no of ppl :89 


col1 col2 col3 col4 
1 2 3 4 
5 6 7 8 


--------------------------------------------

Then you can use a simple unix script which will delete the header and footer and blank lines.

You can call that unix script in your Job by using Before-job subroutine : Exec Sh. This script will remove the header, footer and empty records. Then your file would be like this

Code: Select all

col1 col2 col3 col4
1 2 3 4 
5 6 7 8

And you know how to read this file which has a space as a delimiter.

This is one way of reading this file, there may be some other ways also. Only drawback here is that you have to call the script which manipulates the file so that it may be readable from your Job.

anupam · Post by **anupam** » Thu Sep 02, 2004 5:01 am

I don't think what dnzl has suggested will work. As the problem is reading the file and not with discarding the first 2 records....

First problem is related with the file format only. If datastage is able to read the file without any kind of warning then the problem is solved.

So
@INROWNUM > 2 (if header is first two lines)
These kind of tricks will not work.

Correct me if i am wrong.

denzilsyb · Post by **denzilsyb** » Thu Sep 02, 2004 5:09 am

I don't think what dnzl has suggested will work

anupam; you pointed out something that i overlooked - file format. But, if there is a CR/LF at the end of the row, the header constraint should work; we were using this process before we started to validate the header and trailers.

alexnirmal · Post by **alexnirmal** » Thu Sep 02, 2004 5:41 am

but since this file is in this format

even i cannot read it becoz it is giving errors like this and job gets failed

controlDfile..Sequential_File_0.DSLink40: nls_read_delimited() - row 1, column Field003

anupam · Post by **anupam** » Thu Sep 02, 2004 5:55 am

That is what i pointed out.

unless and untill the file is readable by datastage, no other trick can be used. It will always give this kind of warnings while reading the file.

The best alternative is to delete the header, footer and the blank lines before reading the file and making the file readable.

Then space delimiter can be used while reading the file.

Try this, it will work.

If you want to test then don't write any kind of script at first phase. Just delete the blank line, header and footer using vi editor and test that sample file. If it works then you can write the script for deleting the header/footer/blank lines....

Try it out

denzilsyb · Post by **denzilsyb** » Thu Sep 02, 2004 6:55 am

very simply, in BASIC you would code this algorithm

open file
start loop
read record
if detailRecord then
write to output file
close loop
close file

ok.. what happens here is that you loop within the file testing whether the detail record exists. This works if you have CR/LF because you're just testing the first values (col1). this happens in code though, and not visually as one would like it, but sometimes code is the way to go.

once you have run this code, you are left with a SEQ file that you can run through datastage transforms etc.

ray.wurlod · Post by **ray.wurlod** » Thu Sep 02, 2004 4:18 pm

alexnirmal wrote:but since this file is in this format

even i cannot read it becoz it is giving errors like this and job gets failed

controlDfile..Sequential_File_0.DSLink40: nls_read_delimited() - row 1, column Field003

Within the Sequential File stage on the Columns grid there is scope for setting "missing columns" rules. Scroll the Columns grid to the right to find them. Ignore the missing Field003 and Field004 in the header lines.
(Anupam, this has been used at least once at your site, in Feb 2003.)

anupam · Post by **anupam** » Sat Sep 04, 2004 3:38 am

As Ray suggested, you can use "missing columns" rules for reading the file. Infact in one of our Applications, we have used this method but some how i am not very much comfortable with this approach.

There is no technical reason for this but i feel more comfortable with unix level manipulation of file.

I tested this method for reading the sample data provided by alexnirmal. And i am able to read the file without any kind of error, but then obviously some kind of transformation logic needs to be implemented.