Counting no of records from multiple files

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
sjordery
Premium Member
Premium Member
Posts: 202
Joined: Thu Jun 08, 2006 5:58 am

Counting no of records from multiple files

Post by sjordery »

Hi all,

I have two sequentila file as source and I have to add the no of rows present in both the source files and store it in a target file.
There is no common key present in the both sequential files.

Can anybody suggest what should be the design approach?
Withou using routines is there any other method to achieve this?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not to use DataStage at all. Use two wc -l commands wrapped in expr commands. Capture the results using backticks, and add within the scope of an outer expr command. Use output redirection to write to a target file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
baglasumit21
Participant
Posts: 132
Joined: Wed Mar 01, 2006 11:12 pm
Location: Pune

Re: Counting no of records from multiple files

Post by baglasumit21 »

sjordery wrote:Hi all,

I have two sequentila file as source and I have to add the no of rows present in both the source files and store it in a target file.
There is no common key present in the both sequential files.

Can anybody suggest what should be the design approach?
Withou using routines is there any other method to achieve this?
I think you can achieve it by adding a dummy column using a transformer having the value as 1 and then use an aggregator stage to get the sum for that dummy column. this sum will give the total row count
SMB
smrutiranjannayak
Participant
Posts: 22
Joined: Mon Apr 02, 2007 4:02 am
Location: MUMBAI
Contact:

R u want record counts for file 1 n file 2 in your output?

Post by smrutiranjannayak »

Do you want record counts for file 1 n file 2 in your output? Or you want physical records to be combined in out put file.......??

I fit so then use `wc -l` for file 1 and 2 then ......

If you want records then write a simple shell script as :

cat $I1 > $O1
cat $I2 >> $O1

call it after Job subroutine and take ExecSH option .

Cheers :lol:
Smrutiranjan Nayak
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

If both the files are in the same folder then you can use an external source stage with wc -l and give the file path as wc -l /filepath/*. using a wild character in the end gives u the row count of all the files in the folder. So the number of output records will be the same as the files in the folder. Then the sums counts can be aggregated.
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

External stage also returns the file name along with the row count. you will have to specify a second column with type varchar. i have used it once before. hope that helps.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Header date was not in your original specification. My original solution is inappropriate for the new specification.

You are being too complex. In particular you don't need any Sort stage and you only need one Transformer.

Code: Select all

SeqFile1  ----->  Transformer  ----->  Aggregator  ----->  Join  ----->  Target
                                                             |
                              SeqFile2  ------>  ColGen  ----+
The Transformer stage picks out the header date (If @INROWNUM = 1 Then InLink.Date Else svHeaderDate) and generates a constant for a join key. The Aggregator performs a count and passes through, or groups by, header date and join key. SeqFile2 uses wc -l as a Filter command. The Column Generator stage adds a constant for a join key. You could perform the count in the Transformer and omit the Aggregator if preferred.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

You can do this in a routine:
Open your first seq file and store the row count and header date in 2 variables:
Row_Cnt =Trim(Field(FileLine,',',1))
Header_ date = Trim(Field(FileLine,',',2))
Then open the Reject file and read the total count
Row_Count2
Finally write this to your output file:
Row_Cnt = Row_Cnt + Row_Count2

Write these values --> Row_Cnt, Header_ date
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
Post Reply