Merge Sequential Files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Champa
Participant
Posts: 88
Joined: Wed Dec 14, 2005 1:44 pm

Merge Sequential Files

Post by Champa »

Team,

I have 13 files with 2 columns. In each file first column is id & the second is count. What I want to do is combine all 13 files group by first column & sum the count. I tried using a link collector to combine all the files into a single file & then used an aggregator to sum up. It does not work. Any suggestions?

Thanks
Champa
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use one Sequential file stage with a filter command that uses cat (UNIX) or type or copy (Windows) to load all the rows from all the files into a single stream. Run this through the Aggregator stage. Even better would be to run the output of cat through sort (on the key) as part of the filter; this will mean your Aggregator stage runs MUCH faster.
Last edited by ray.wurlod on Wed Mar 15, 2006 5:50 am, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Champa
Participant
Posts: 88
Joined: Wed Dec 14, 2005 1:44 pm

Post by Champa »

Ray,

Can you please expand on your first sentence.

Thanks.
Champa
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Champa
Use one Sequential file stage with a filter command the uses cat to load all the rows from all the files into a single stream
You can use filter command inside your Sequential File Stage. By default it is not enabled. Go to the Stage tab inside Sequential File stage and tick "Stage uses filter commands". This will allow you to use filter commands as Ray mentioned.
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
Champa
Participant
Posts: 88
Joined: Wed Dec 14, 2005 1:44 pm

Post by Champa »

Thanks Siva

I will try it out tomorrow
Champa
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

Once you enable the "Stage uses filter commands", you will be able pass on any filter command on the 'Outputs' tab of your sequential file stage.

For your case,

As Ray suggested,

you can concatenate(merge) your 13 files this way to one final file:

Code: Select all


cat  file1  file2 ....... file13  finalfile

And pipe the output of this merge to a sort command for the aggregator stage to run faster.

HTH,
Naveen.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Hi Guys,
Champa's job is running on Windows platform; there is no "CAT" command to concatnate the files. Am I missing something?
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

lstsaur wrote:Hi Guys,
Champa's job is running on Windows platform; there is no "CAT" command to concatnate the files. Am I missing something?
Then 'TYPE' would help.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

That's right. Somehow we all have overlooked it. Thanks for correcting us lstsaur.

As kumar suggested, we can use TYPE instead of 'cat' or we can also use 'COPY' to merge multiple files in DOS.

Thanks,
Naveen.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
vsi
Premium Member
Premium Member
Posts: 507
Joined: Wed Mar 15, 2006 1:44 pm

Post by vsi »

I have done that using TYPE as well as COPY command. It didnt work.

I used the the following in filter command

COPY c:\temp\items.txt c:\temp\items1.txt c:\temp\to.txt


which resulted in the following error


Gave the below error :
uniting..Transformer_2: |uniting..Sequential_File_0.DSLink3: DSD.SEQOpen No Filename to open..|
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What file name did you provide for the Sequential File stage to read?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When you're using a filter, the Sequential File stage reads stdout from the filter directly (so you don't terminate the type command with a target file name, just type directly onto stdout). DataStage consumes these rows directly but must - for some strange reason - have the File Name property supplied with a value. It can be any valid file name, such as /dev/null (UNIX) or .\NUL (Windows).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

A little bit off-topic but what I would do for the original problem would be to load all files to a database table and then use aggregation via SQL on the database using a DB2/Oracle stage. The aggregator stage doesn't work well with large input sets.

Ogmios
In theory there's no difference between theory and practice. In practice there is.
vsi
Premium Member
Premium Member
Posts: 507
Joined: Wed Mar 15, 2006 1:44 pm

Post by vsi »

ray.wurlod wrote:When you're using a filter, the Sequential File stage reads stdout from the filter directly (so you don't terminate the type command with a target file name, just type directly onto stdout). DataStage consumes these rows directly but must - for some strange reason - have the File Name property supplied with a value. It can be any valid file name, such as /dev/null (UNIX) or .\NUL (Windows).
do u mean stdout or stdin? i didnt understand about the termination and consumption?

scenario is like:

seq_source-->X-->seq_target

i have 2 files c:\source1.txt,c:\source2.txt

i need to copy in c:\target.txt

What do i need to write in Filter Command and File Name in Windows and in UNIX.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Effectively what happens is filter_cmd | seq_source --> xfmr --> seq_target (yes, I DO mean stdout of the filter command)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply