How to count the number of records in a seq file

gpbarsky · Post by **gpbarsky** » Mon Apr 18, 2005 3:44 pm

Hi forumers.......

I need to know if there is an easy way to count how many records are there in a sequential file. And I have to makew this count from within a BASIC job.

Any help will be appreciated.

Thanks in advance.

ray.wurlod · Post by **ray.wurlod** » Mon Apr 18, 2005 6:15 pm

Omitting error handling for clarity:

Code: Select all

OpenSeq "pathname" To filevariable
Then
   LineCount = 0
   Loop
   While ReadSeq Line From filevariable
      LineCount += 1
   Repeat
End
CloseSeq filevariable
Ans = LineCount

gpbarsky · Post by **gpbarsky** » Tue Apr 19, 2005 8:58 am

Ray:

Thank you. It worked fine, but the only thing that I had to change is the beginning of the variable LineCount: I set it to 0 in order to work fine.

Have a nice day.

ray.wurlod · Post by **ray.wurlod** » Tue Apr 19, 2005 5:55 pm

A typo, thanks for catching it. I have edited the original code so that LineCount is correctly initialized to 0. (I was doing it in a noisy airport lounge!)

palaniappan · Post by **palaniappan** » Thu Apr 21, 2005 6:05 am

Ray,

Tried this code for two different file formats. UCS-2 and UTF-8. While the second gives the correct number, UCS-2 file returns a row additional than the actual number. Any suggestions on it.

Thanks, Pal.

ray.wurlod wrote:Omitting error handling for clarity:

Code: Select all

OpenSeq "pathname" To filevariable
Then
   LineCount = 0
   Loop
   While ReadSeq Line From filevariable
      LineCount += 1
   Repeat
End
CloseSeq filevariable
Ans = LineCount

palaniappan · Post by **palaniappan** » Thu Apr 21, 2005 6:06 am

Ray,

Tried this code for two different file formats. UCS-2 and UTF-8. While the second gives the correct number, UCS-2 file returns a row additional than the actual number. Any suggestions on it.

Thanks, Pal.

ray.wurlod wrote:Omitting error handling for clarity:

Code: Select all

OpenSeq "pathname" To filevariable
Then
   LineCount = 0
   Loop
   While ReadSeq Line From filevariable
      LineCount += 1
   Repeat
End
CloseSeq filevariable
Ans = LineCount

ray.wurlod · Post by **ray.wurlod** » Thu Apr 21, 2005 1:31 pm

No idea. The character encoding should make no difference at all. Make sure (by inspection) that you don't have an extra empty line at the end of the UCS-2 file.

palmeal · Post by **palmeal** » Fri Apr 22, 2005 1:14 am

Would using openseq and then a loop not be slow for really large and numerous files ? Would it be better to break out to unix and wc -1 on the file and then awk to pull out the column 1 ($1) number.
It's probably best practice to keep control within datastage but wonder what the thoughts are on this as performance maybe more important.

mhester · Post by **mhester** » Fri Apr 22, 2005 6:40 am

I believe Ray posted the correct solution given the posters information. If this were Unix then I believe Ray and others would have certainly given wc -l as an option along with the option of using BASIC to read the file. A routine like what was posted running in the background with no terminal i/o is actually very efficient and can "chunk" through large files very quickly. Of course there comes a point (size) where some other method would be faster, but the poster did not indicate how big the source is in either number of rows or the width of each row.