Hi all,
I have two sequentila file as source and I have to add the no of rows present in both the source files and store it in a target file.
There is no common key present in the both sequential files.
Can anybody suggest what should be the design approach?
Withou using routines is there any other method to achieve this?
Counting no of records from multiple files
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Not to use DataStage at all. Use two wc -l commands wrapped in expr commands. Capture the results using backticks, and add within the scope of an outer expr command. Use output redirection to write to a target file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 132
- Joined: Wed Mar 01, 2006 11:12 pm
- Location: Pune
Re: Counting no of records from multiple files
I think you can achieve it by adding a dummy column using a transformer having the value as 1 and then use an aggregator stage to get the sum for that dummy column. this sum will give the total row countsjordery wrote:Hi all,
I have two sequentila file as source and I have to add the no of rows present in both the source files and store it in a target file.
There is no common key present in the both sequential files.
Can anybody suggest what should be the design approach?
Withou using routines is there any other method to achieve this?
SMB
-
- Participant
- Posts: 22
- Joined: Mon Apr 02, 2007 4:02 am
- Location: MUMBAI
- Contact:
R u want record counts for file 1 n file 2 in your output?
Do you want record counts for file 1 n file 2 in your output? Or you want physical records to be combined in out put file.......??
I fit so then use `wc -l` for file 1 and 2 then ......
If you want records then write a simple shell script as :
cat $I1 > $O1
cat $I2 >> $O1
call it after Job subroutine and take ExecSH option .
Cheers
I fit so then use `wc -l` for file 1 and 2 then ......
If you want records then write a simple shell script as :
cat $I1 > $O1
cat $I2 >> $O1
call it after Job subroutine and take ExecSH option .
Cheers
Smrutiranjan Nayak
If both the files are in the same folder then you can use an external source stage with wc -l and give the file path as wc -l /filepath/*. using a wild character in the end gives u the row count of all the files in the folder. So the number of output records will be the same as the files in the folder. Then the sums counts can be aggregated.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Header date was not in your original specification. My original solution is inappropriate for the new specification.
You are being too complex. In particular you don't need any Sort stage and you only need one Transformer.
The Transformer stage picks out the header date (If @INROWNUM = 1 Then InLink.Date Else svHeaderDate) and generates a constant for a join key. The Aggregator performs a count and passes through, or groups by, header date and join key. SeqFile2 uses wc -l as a Filter command. The Column Generator stage adds a constant for a join key. You could perform the count in the Transformer and omit the Aggregator if preferred.
You are being too complex. In particular you don't need any Sort stage and you only need one Transformer.
Code: Select all
SeqFile1 -----> Transformer -----> Aggregator -----> Join -----> Target
|
SeqFile2 ------> ColGen ----+
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 612
- Joined: Thu May 03, 2007 4:59 am
- Location: Melbourne
You can do this in a routine:
Open your first seq file and store the row count and header date in 2 variables:
Row_Cnt =Trim(Field(FileLine,',',1))
Header_ date = Trim(Field(FileLine,',',2))
Then open the Reject file and read the total count
Row_Count2
Finally write this to your output file:
Row_Cnt = Row_Cnt + Row_Count2
Write these values --> Row_Cnt, Header_ date
Open your first seq file and store the row count and header date in 2 variables:
Row_Cnt =Trim(Field(FileLine,',',1))
Header_ date = Trim(Field(FileLine,',',2))
Then open the Reject file and read the total count
Row_Count2
Finally write this to your output file:
Row_Cnt = Row_Cnt + Row_Count2
Write these values --> Row_Cnt, Header_ date
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>