Page 1 of 1

How to count the number of records in a seq file

Posted: Mon Apr 18, 2005 3:44 pm
by gpbarsky
Hi forumers.......

I need to know if there is an easy way to count how many records are there in a sequential file. And I have to makew this count from within a BASIC job.

Any help will be appreciated.

Thanks in advance.
:)

Posted: Mon Apr 18, 2005 6:15 pm
by ray.wurlod
Omitting error handling for clarity:

Code: Select all

OpenSeq "pathname" To filevariable
Then
   LineCount = 0
   Loop
   While ReadSeq Line From filevariable
      LineCount += 1
   Repeat
End
CloseSeq filevariable
Ans = LineCount

Posted: Tue Apr 19, 2005 8:58 am
by gpbarsky
Ray:

Thank you. It worked fine, but the only thing that I had to change is the beginning of the variable LineCount: I set it to 0 in order to work fine.

Have a nice day.
:)

Posted: Tue Apr 19, 2005 5:55 pm
by ray.wurlod
A typo, thanks for catching it. I have edited the original code so that LineCount is correctly initialized to 0. (I was doing it in a noisy airport lounge!)

Posted: Thu Apr 21, 2005 6:05 am
by palaniappan
Ray,

Tried this code for two different file formats. UCS-2 and UTF-8. While the second gives the correct number, UCS-2 file returns a row additional than the actual number. Any suggestions on it.

Thanks, Pal.
ray.wurlod wrote:Omitting error handling for clarity:

Code: Select all

OpenSeq "pathname" To filevariable
Then
   LineCount = 0
   Loop
   While ReadSeq Line From filevariable
      LineCount += 1
   Repeat
End
CloseSeq filevariable
Ans = LineCount

Posted: Thu Apr 21, 2005 6:06 am
by palaniappan
Ray,

Tried this code for two different file formats. UCS-2 and UTF-8. While the second gives the correct number, UCS-2 file returns a row additional than the actual number. Any suggestions on it.

Thanks, Pal.
ray.wurlod wrote:Omitting error handling for clarity:

Code: Select all

OpenSeq "pathname" To filevariable
Then
   LineCount = 0
   Loop
   While ReadSeq Line From filevariable
      LineCount += 1
   Repeat
End
CloseSeq filevariable
Ans = LineCount

Posted: Thu Apr 21, 2005 1:31 pm
by ray.wurlod
No idea. The character encoding should make no difference at all. Make sure (by inspection) that you don't have an extra empty line at the end of the UCS-2 file.

Posted: Fri Apr 22, 2005 1:14 am
by palmeal
Would using openseq and then a loop not be slow for really large and numerous files ? Would it be better to break out to unix and wc -1 on the file and then awk to pull out the column 1 ($1) number.
It's probably best practice to keep control within datastage but wonder what the thoughts are on this as performance maybe more important.

Posted: Fri Apr 22, 2005 6:40 am
by mhester
I believe Ray posted the correct solution given the posters information. If this were Unix then I believe Ray and others would have certainly given wc -l as an option along with the option of using BASIC to read the file. A routine like what was posted running in the background with no terminal i/o is actually very efficient and can "chunk" through large files very quickly. Of course there comes a point (size) where some other method would be faster, but the poster did not indicate how big the source is in either number of rows or the width of each row.