Page 1 of 1

Counting the number of rows in a sequential file

Posted: Thu Feb 05, 2004 6:25 am
by bapajju
I have a sequential file as source and another sequential file as target. I want to capture the number of records of the source file in the target. Please lemme know how to achieve this.

Thanks
Bapajju

Re: Counting the number of rows in a sequential file

Posted: Thu Feb 05, 2004 6:45 am
by chulett
bapajju wrote:I want to capture the number of records of the source file in the target.
Depends on what this means exactly... Just the number of rows? Appended as a 'trailer' record? Mixed in with your other output?

One way is to use a UNIX script to run 'wc -l', capture the output and send it to the job as a parameter, where it would be available for the duration of the job.

There is also the @INROWCOUNT or @OUTROWCOUNT that can be leveraged, depending on your need, perhaps in a Stage variable.

Another method is to split another link off your transform with a simple piece of data in it (one field, @INROWNUM or even a constant), pass it thru an Aggregator set to SUM or LAST and then do something with the output. Remembering that you can't have two output streams to the same sequential file, you could run the second one to another file and combine the two in an after-job subroutine that calls another script.

Does any of that help?

Posted: Thu Feb 05, 2004 11:26 am
by jreddy
Elaborating on what Craig said,

Assuming you have only one input column in source sequential stage, just use a source sequential stage followed by an aggregator stage and then the target sequential stage. In the aggregator inputs tab, u need not sort or anything, but in outputs tab, select the derivation as Count(inputlinkname.columnname). This will return the number of rows in input file.
If you have more than one input column in source sequential stage, use another transformer in between the sequential and aggregator stage. And in that just pass one column of the source.

I had a similar situation in one of my jobs and this did the trick. Hope this helps you too.

Posted: Thu Feb 05, 2004 1:23 pm
by chulett
Thansk for the clarification. I meant COUNT when I said SUM, unless you have a need for SUM (which I recently did) that is.

I was doing something similar, except some rows needed to be counted and others didn't. So what I ended up passing to the Aggregator was a 1 for the rows that mattered and a 0 for the ones that didn't. The aggregation method was set to SUM and it worked like a charm for me.

Posted: Thu Feb 05, 2004 3:29 pm
by ray.wurlod
If your job has this design:

SeqFile --> Transformer --> SeqFile

then your job log has an "active stage finishing" event in which the row counts from the input and output links for the Transformer are recorded. This can be inspected from Director, or retrieved by the DSGetLog... functions.