Merging files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Merging files

Post by anu123 »

I have 12 source files as

file_1,file_2,....file_12.

I want merge them all to create a single file say 'file'

Is there any such stage in DataStage where we can give all input file names and get the merged file as output.

Basically I need 'cat file_1,file_2 .....file_12 > file'
Thank you,
Anu
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Use the cat command in the Filter options in the sequential file stage properties. Funnel Stage can get you what you need too.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

DSguru2B wrote:Use the cat command in the Filter options in the sequential file stage properties. Funnel Stage can get you what you need too.
thanks guru.

If I go for 'funnel' I guess, my job looks ugly with '12' source stages.
I could not understand the 1st option. If I am not wrong, in sequential stage properties..


Options --> Filter = cat #path#file_1,#path#_file_2.... > #path#_file

thne what would be the

Source --> File = ?

thank you
Thank you,
Anu
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Good question. I am not sure what you are going to put there. Do this
In the filter command, get rid of the redirection. Basically your filter command will look something like this

Code: Select all

cat #path#file_1 #path#_file_2 cat #path#_file_3 ......#path#_file_12

In the File section put #path#file_1.
I think it will get the output from the filter command and neglect the File name. I tried a test and it works.
NOTE: Even though it seems its neglecting the file name, but it has to be a valid file name. Thats why I asked you to put #path#file_1 as it will already exist for the filter command.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Even easier (and better performing, because you will get parallel operation) would be to use multiple File properties in the Sequential File stage, or to change the read mode to "File Pattern" and specify a regular expression to select the files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

ray.wurlod wrote:Even easier (and better performing, because you will get parallel operation) would be to use multiple File properties in the Sequential File stage, or to change the read mode to "File Pattern" and specify a regular expression to select the files.
thanks Ray.

I am unable to locate a 'Multiple File properties' option in sequential file stage. trying with 2nd option you gave me

I changed 'Read mode' to 'File Pattern' and in the 'File Pattern' files as
#path#file_1.csv #path#file_2.csv (used 2 files to test).

it thows an error as

Input buffer overrun at field "col2"


I know these silly questions but I ma pretty new to PX.thanks in advance.
Thank you,
Anu
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

anu123 wrote:
ray.wurlod wrote:Even easier (and better performing, because you will get parallel operation) would be to use multiple File properties in the Sequential File stage, or to change the read mode to "File Pattern" and specify a regular expression to select the files.
thanks Ray.

I am unable to locate a 'Multiple File properties' option in sequential file stage. trying with 2nd option you gave me

I changed 'Read mode' to 'File Pattern' and in the 'File Pattern' files as
#path#file_1.csv #path#file_2.csv (used 2 files to test).

it thows an error as

Input buffer overrun at field "col2"


I know these silly questions but I ma pretty new to PX.thanks in advance.

I figured it out. Its data type issue. Thanks all for you valuable time.

Ray would you mind to elaborate a bit more on the 'Multiple File properties' approach...

thanks a lot.
Thank you,
Anu
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Read in the sequential file stage help pdf file about "File Pattern" in "Read Method". I would say its a better way to handle your requirement. It also takes in wild cards and processes all the files.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

DSguru2B wrote:Read in the sequential file stage help pdf file about "File Pattern" in "Read Method". I would say its a better way to handle your requirement. It also takes in wild cards and processes all the files.
thank you.
Thank you,
Anu
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Read Method is "Specific File".
You can add more than one "File" property.

This is what I meant by multiple File properties.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply