Merge Sequential Files
Moderators: chulett, rschirm, roy
Merge Sequential Files
Team,
I have 13 files with 2 columns. In each file first column is id & the second is count. What I want to do is combine all 13 files group by first column & sum the count. I tried using a link collector to combine all the files into a single file & then used an aggregator to sum up. It does not work. Any suggestions?
Thanks
I have 13 files with 2 columns. In each file first column is id & the second is count. What I want to do is combine all 13 files group by first column & sum the count. I tried using a link collector to combine all the files into a single file & then used an aggregator to sum up. It does not work. Any suggestions?
Thanks
Champa
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Use one Sequential file stage with a filter command that uses cat (UNIX) or type or copy (Windows) to load all the rows from all the files into a single stream. Run this through the Aggregator stage. Even better would be to run the output of cat through sort (on the key) as part of the filter; this will mean your Aggregator stage runs MUCH faster.
Last edited by ray.wurlod on Wed Mar 15, 2006 5:50 am, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Champa
You can use filter command inside your Sequential File Stage. By default it is not enabled. Go to the Stage tab inside Sequential File stage and tick "Stage uses filter commands". This will allow you to use filter commands as Ray mentioned.Use one Sequential file stage with a filter command the uses cat to load all the rows from all the files into a single stream
Regards
Siva
Listening to the Learned
"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
Siva
Listening to the Learned
"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
-
- Premium Member
- Posts: 1255
- Joined: Wed Feb 02, 2005 11:54 am
- Location: United States of America
Once you enable the "Stage uses filter commands", you will be able pass on any filter command on the 'Outputs' tab of your sequential file stage.
For your case,
As Ray suggested,
you can concatenate(merge) your 13 files this way to one final file:
And pipe the output of this merge to a sort command for the aggregator stage to run faster.
HTH,
Naveen.
For your case,
As Ray suggested,
you can concatenate(merge) your 13 files this way to one final file:
Code: Select all
cat file1 file2 ....... file13 finalfile
HTH,
Naveen.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
-
- Premium Member
- Posts: 1255
- Joined: Wed Feb 02, 2005 11:54 am
- Location: United States of America
That's right. Somehow we all have overlooked it. Thanks for correcting us lstsaur.
As kumar suggested, we can use TYPE instead of 'cat' or we can also use 'COPY' to merge multiple files in DOS.
Thanks,
Naveen.
As kumar suggested, we can use TYPE instead of 'cat' or we can also use 'COPY' to merge multiple files in DOS.
Thanks,
Naveen.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
I have done that using TYPE as well as COPY command. It didnt work.
I used the the following in filter command
COPY c:\temp\items.txt c:\temp\items1.txt c:\temp\to.txt
which resulted in the following error
Gave the below error :
uniting..Transformer_2: |uniting..Sequential_File_0.DSLink3: DSD.SEQOpen No Filename to open..|
I used the the following in filter command
COPY c:\temp\items.txt c:\temp\items1.txt c:\temp\to.txt
which resulted in the following error
Gave the below error :
uniting..Transformer_2: |uniting..Sequential_File_0.DSLink3: DSD.SEQOpen No Filename to open..|
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
When you're using a filter, the Sequential File stage reads stdout from the filter directly (so you don't terminate the type command with a target file name, just type directly onto stdout). DataStage consumes these rows directly but must - for some strange reason - have the File Name property supplied with a value. It can be any valid file name, such as /dev/null (UNIX) or .\NUL (Windows).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
A little bit off-topic but what I would do for the original problem would be to load all files to a database table and then use aggregation via SQL on the database using a DB2/Oracle stage. The aggregator stage doesn't work well with large input sets.
Ogmios
Ogmios
In theory there's no difference between theory and practice. In practice there is.
do u mean stdout or stdin? i didnt understand about the termination and consumption?ray.wurlod wrote:When you're using a filter, the Sequential File stage reads stdout from the filter directly (so you don't terminate the type command with a target file name, just type directly onto stdout). DataStage consumes these rows directly but must - for some strange reason - have the File Name property supplied with a value. It can be any valid file name, such as /dev/null (UNIX) or .\NUL (Windows).
scenario is like:
seq_source-->X-->seq_target
i have 2 files c:\source1.txt,c:\source2.txt
i need to copy in c:\target.txt
What do i need to write in Filter Command and File Name in Windows and in UNIX.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: