How to count the number of records in a seq file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
gpbarsky
Participant
Posts: 160
Joined: Tue May 06, 2003 8:20 pm
Location: Argentina

How to count the number of records in a seq file

Post by gpbarsky »

Hi forumers.......

I need to know if there is an easy way to count how many records are there in a sequential file. And I have to makew this count from within a BASIC job.

Any help will be appreciated.

Thanks in advance.
:)
Guillermo P. Barsky
Buenos Aires - Argentina
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Omitting error handling for clarity:

Code: Select all

OpenSeq "pathname" To filevariable
Then
   LineCount = 0
   Loop
   While ReadSeq Line From filevariable
      LineCount += 1
   Repeat
End
CloseSeq filevariable
Ans = LineCount
Last edited by ray.wurlod on Tue Apr 19, 2005 5:54 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gpbarsky
Participant
Posts: 160
Joined: Tue May 06, 2003 8:20 pm
Location: Argentina

Post by gpbarsky »

Ray:

Thank you. It worked fine, but the only thing that I had to change is the beginning of the variable LineCount: I set it to 0 in order to work fine.

Have a nice day.
:)
Guillermo P. Barsky
Buenos Aires - Argentina
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A typo, thanks for catching it. I have edited the original code so that LineCount is correctly initialized to 0. (I was doing it in a noisy airport lounge!)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
palaniappan
Participant
Posts: 41
Joined: Wed Mar 05, 2003 1:28 am

Post by palaniappan »

Ray,

Tried this code for two different file formats. UCS-2 and UTF-8. While the second gives the correct number, UCS-2 file returns a row additional than the actual number. Any suggestions on it.

Thanks, Pal.
ray.wurlod wrote:Omitting error handling for clarity:

Code: Select all

OpenSeq "pathname" To filevariable
Then
   LineCount = 0
   Loop
   While ReadSeq Line From filevariable
      LineCount += 1
   Repeat
End
CloseSeq filevariable
Ans = LineCount
palaniappan
Participant
Posts: 41
Joined: Wed Mar 05, 2003 1:28 am

Post by palaniappan »

Ray,

Tried this code for two different file formats. UCS-2 and UTF-8. While the second gives the correct number, UCS-2 file returns a row additional than the actual number. Any suggestions on it.

Thanks, Pal.
ray.wurlod wrote:Omitting error handling for clarity:

Code: Select all

OpenSeq "pathname" To filevariable
Then
   LineCount = 0
   Loop
   While ReadSeq Line From filevariable
      LineCount += 1
   Repeat
End
CloseSeq filevariable
Ans = LineCount
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No idea. The character encoding should make no difference at all. Make sure (by inspection) that you don't have an extra empty line at the end of the UCS-2 file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
palmeal
Participant
Posts: 122
Joined: Thu Oct 14, 2004 7:56 am
Location: Edinburgh, Scotland

Post by palmeal »

Would using openseq and then a loop not be slow for really large and numerous files ? Would it be better to break out to unix and wc -1 on the file and then awk to pull out the column 1 ($1) number.
It's probably best practice to keep control within datastage but wonder what the thoughts are on this as performance maybe more important.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

I believe Ray posted the correct solution given the posters information. If this were Unix then I believe Ray and others would have certainly given wc -l as an option along with the option of using BASIC to read the file. A routine like what was posted running in the background with no terminal i/o is actually very efficient and can "chunk" through large files very quickly. Of course there comes a point (size) where some other method would be faster, but the poster did not indicate how big the source is in either number of rows or the width of each row.
Post Reply