Page 1 of 1

Counting no of records from multiple files

Posted: Tue May 15, 2007 11:30 pm
by sjordery
Hi all,

I have two sequentila file as source and I have to add the no of rows present in both the source files and store it in a target file.
There is no common key present in the both sequential files.

Can anybody suggest what should be the design approach?
Withou using routines is there any other method to achieve this?

Posted: Tue May 15, 2007 11:38 pm
by ray.wurlod
Not to use DataStage at all. Use two wc -l commands wrapped in expr commands. Capture the results using backticks, and add within the scope of an outer expr command. Use output redirection to write to a target file.

Re: Counting no of records from multiple files

Posted: Wed May 16, 2007 12:13 am
by baglasumit21
sjordery wrote:Hi all,

I have two sequentila file as source and I have to add the no of rows present in both the source files and store it in a target file.
There is no common key present in the both sequential files.

Can anybody suggest what should be the design approach?
Withou using routines is there any other method to achieve this?
I think you can achieve it by adding a dummy column using a transformer having the value as 1 and then use an aggregator stage to get the sum for that dummy column. this sum will give the total row count

R u want record counts for file 1 n file 2 in your output?

Posted: Wed May 16, 2007 12:29 am
by smrutiranjannayak
Do you want record counts for file 1 n file 2 in your output? Or you want physical records to be combined in out put file.......??

I fit so then use `wc -l` for file 1 and 2 then ......

If you want records then write a simple shell script as :

cat $I1 > $O1
cat $I2 >> $O1

call it after Job subroutine and take ExecSH option .

Cheers :lol:

Posted: Wed May 16, 2007 12:43 am
by Maveric
If both the files are in the same folder then you can use an external source stage with wc -l and give the file path as wc -l /filepath/*. using a wild character in the end gives u the row count of all the files in the folder. So the number of output records will be the same as the files in the folder. Then the sums counts can be aggregated.

Posted: Wed May 16, 2007 12:48 am
by Maveric
External stage also returns the file name along with the row count. you will have to specify a second column with type varchar. i have used it once before. hope that helps.

Posted: Wed May 16, 2007 3:28 pm
by ray.wurlod
Header date was not in your original specification. My original solution is inappropriate for the new specification.

You are being too complex. In particular you don't need any Sort stage and you only need one Transformer.

Code: Select all

SeqFile1  ----->  Transformer  ----->  Aggregator  ----->  Join  ----->  Target
                                                             |
                              SeqFile2  ------>  ColGen  ----+
The Transformer stage picks out the header date (If @INROWNUM = 1 Then InLink.Date Else svHeaderDate) and generates a constant for a join key. The Aggregator performs a count and passes through, or groups by, header date and join key. SeqFile2 uses wc -l as a Filter command. The Column Generator stage adds a constant for a join key. You could perform the count in the Transformer and omit the Aggregator if preferred.

Posted: Wed May 16, 2007 9:18 pm
by JoshGeorge
You can do this in a routine:
Open your first seq file and store the row count and header date in 2 variables:
Row_Cnt =Trim(Field(FileLine,',',1))
Header_ date = Trim(Field(FileLine,',',2))
Then open the Reject file and read the total count
Row_Count2
Finally write this to your output file:
Row_Cnt = Row_Cnt + Row_Count2

Write these values --> Row_Cnt, Header_ date