Counting no of records from multiple files

sjordery · Post by **sjordery** » Tue May 15, 2007 11:30 pm

Hi all,

I have two sequentila file as source and I have to add the no of rows present in both the source files and store it in a target file.
There is no common key present in the both sequential files.

Can anybody suggest what should be the design approach?
Withou using routines is there any other method to achieve this?

ray.wurlod · Post by **ray.wurlod** » Tue May 15, 2007 11:38 pm

Not to use DataStage at all. Use two wc -l commands wrapped in expr commands. Capture the results using backticks, and add within the scope of an outer expr command. Use output redirection to write to a target file.

baglasumit21 · Post by **baglasumit21** » Wed May 16, 2007 12:13 am

sjordery wrote:Hi all,

I have two sequentila file as source and I have to add the no of rows present in both the source files and store it in a target file.
There is no common key present in the both sequential files.

Can anybody suggest what should be the design approach?
Withou using routines is there any other method to achieve this?

I think you can achieve it by adding a dummy column using a transformer having the value as 1 and then use an aggregator stage to get the sum for that dummy column. this sum will give the total row count

smrutiranjannayak · Post by **smrutiranjannayak** » Wed May 16, 2007 12:29 am

Do you want record counts for file 1 n file 2 in your output? Or you want physical records to be combined in out put file.......??

I fit so then use `wc -l` for file 1 and 2 then ......

If you want records then write a simple shell script as :

cat $I1 > $O1
cat $I2 >> $O1

call it after Job subroutine and take ExecSH option .

Cheers

Maveric · Post by **Maveric** » Wed May 16, 2007 12:43 am

If both the files are in the same folder then you can use an external source stage with wc -l and give the file path as wc -l /filepath/*. using a wild character in the end gives u the row count of all the files in the folder. So the number of output records will be the same as the files in the folder. Then the sums counts can be aggregated.

Maveric · Post by **Maveric** » Wed May 16, 2007 12:48 am

External stage also returns the file name along with the row count. you will have to specify a second column with type varchar. i have used it once before. hope that helps.

ray.wurlod · Post by **ray.wurlod** » Wed May 16, 2007 3:28 pm

Header date was not in your original specification. My original solution is inappropriate for the new specification.

You are being too complex. In particular you don't need any Sort stage and you only need one Transformer.

Code: Select all

SeqFile1  ----->  Transformer  ----->  Aggregator  ----->  Join  ----->  Target
                                                             |
                              SeqFile2  ------>  ColGen  ----+

The Transformer stage picks out the header date (If @INROWNUM = 1 Then InLink.Date Else svHeaderDate) and generates a constant for a join key. The Aggregator performs a count and passes through, or groups by, header date and join key. SeqFile2 uses wc -l as a Filter command. The Column Generator stage adds a constant for a join key. You could perform the count in the Transformer and omit the Aggregator if preferred.

JoshGeorge · Post by **JoshGeorge** » Wed May 16, 2007 9:18 pm

You can do this in a routine:
Open your first seq file and store the row count and header date in 2 variables:
Row_Cnt =Trim(Field(FileLine,',',1))
Header_ date = Trim(Field(FileLine,',',2))
Then open the Reject file and read the total count
Row_Count2
Finally write this to your output file:
Row_Cnt = Row_Cnt + Row_Count2

Write these values --> Row_Cnt, Header_ date

DSXchange

Counting no of records from multiple files

Counting no of records from multiple files

Re: Counting no of records from multiple files

R u want record counts for file 1 n file 2 in your output?