Counting the number of rows in a sequential file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bapajju
Participant
Posts: 82
Joined: Wed Nov 19, 2003 11:58 pm

Counting the number of rows in a sequential file

Post by bapajju »

I have a sequential file as source and another sequential file as target. I want to capture the number of records of the source file in the target. Please lemme know how to achieve this.

Thanks
Bapajju
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Counting the number of rows in a sequential file

Post by chulett »

bapajju wrote:I want to capture the number of records of the source file in the target.
Depends on what this means exactly... Just the number of rows? Appended as a 'trailer' record? Mixed in with your other output?

One way is to use a UNIX script to run 'wc -l', capture the output and send it to the job as a parameter, where it would be available for the duration of the job.

There is also the @INROWCOUNT or @OUTROWCOUNT that can be leveraged, depending on your need, perhaps in a Stage variable.

Another method is to split another link off your transform with a simple piece of data in it (one field, @INROWNUM or even a constant), pass it thru an Aggregator set to SUM or LAST and then do something with the output. Remembering that you can't have two output streams to the same sequential file, you could run the second one to another file and combine the two in an after-job subroutine that calls another script.

Does any of that help?
-craig

"You can never have too many knives" -- Logan Nine Fingers
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Post by jreddy »

Elaborating on what Craig said,

Assuming you have only one input column in source sequential stage, just use a source sequential stage followed by an aggregator stage and then the target sequential stage. In the aggregator inputs tab, u need not sort or anything, but in outputs tab, select the derivation as Count(inputlinkname.columnname). This will return the number of rows in input file.
If you have more than one input column in source sequential stage, use another transformer in between the sequential and aggregator stage. And in that just pass one column of the source.

I had a similar situation in one of my jobs and this did the trick. Hope this helps you too.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Thansk for the clarification. I meant COUNT when I said SUM, unless you have a need for SUM (which I recently did) that is.

I was doing something similar, except some rows needed to be counted and others didn't. So what I ended up passing to the Aggregator was a 1 for the rows that mattered and a 0 for the ones that didn't. The aggregation method was set to SUM and it worked like a charm for me.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If your job has this design:

SeqFile --> Transformer --> SeqFile

then your job log has an "active stage finishing" event in which the row counts from the input and output links for the Transformer are recorded. This can be inspected from Director, or retrieved by the DSGetLog... functions.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply