multiple text files as input
Moderators: chulett, rschirm, roy
multiple text files as input
For one of our integration points the inputs will be a set of files sitting on a Unix server. We would like the DataStage process to input and process all of the files contained in the directory. Can anybody please advise which stage should we need to use to retrieve and process multiple text files as a batch.
Thanks in advance....
Thanks in advance....
Hi,
One way is to use the file pattern as your source rather then a specific file path.
search for it in the parallel job developers guide ( 5-28 ).
IHTH,
One way is to use the file pattern as your source rather then a specific file path.
search for it in the parallel job developers guide ( 5-28 ).
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Hi,
if you specify in your sequential file stage your read method as file pattern and use wild cards in your file path (i.e: #path#/*) you will read all files in the specified path.
IHTH,
if you specify in your sequential file stage your read method as file pattern and use wild cards in your file path (i.e: #path#/*) you will read all files in the specified path.
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
There is another way of processing these files, which is execute the cat command or write a shell script to concatenate all the files into one and use the output file in the job. In order to implement this method, the no of files in the directory are static and metadata of these files must be same.
HTWH.
Regards
Saravanan
HTWH.
Regards
Saravanan
Hi Sarvanan,
is this command works out for my solution if i put in before job subroutine as
ExecDOS copy /apps/Ascemtial/Projects/IDD/*.txt /a /apps/Ascential/Projects/IDD/ADTest.txt /a
or shall i write a shell script and execute in before job subroutine
ExecSH merge.sh
which one of the above commands is good in concatenating in correct data format of all input files without any descrepencies.
and my second question: is there any limit for concatenating the data for a single text file?
third question:- if file extension is same and file format different(like difference in no of columns or file format is different(fixed width,delimited) , is there any way to process the files in a directory ?
i think i am crossing my fingers......![Rolling Eyes :roll:](./images/smilies/icon_rolleyes.gif)
is this command works out for my solution if i put in before job subroutine as
ExecDOS copy /apps/Ascemtial/Projects/IDD/*.txt /a /apps/Ascential/Projects/IDD/ADTest.txt /a
or shall i write a shell script and execute in before job subroutine
ExecSH merge.sh
which one of the above commands is good in concatenating in correct data format of all input files without any descrepencies.
and my second question: is there any limit for concatenating the data for a single text file?
third question:- if file extension is same and file format different(like difference in no of columns or file format is different(fixed width,delimited) , is there any way to process the files in a directory ?
i think i am crossing my fingers......
![Rolling Eyes :roll:](./images/smilies/icon_rolleyes.gif)
Hi,
performing a type #PATH1#\*.* > #PATH2#\Myfile
will be far slower (the more data the slower it will be)
then simply read the files and process the read buffer!
That is my humble opinion and probably all I have to say for this post![Smile :)](./images/smilies/icon_smile.gif)
Good Luck,
performing a type #PATH1#\*.* > #PATH2#\Myfile
will be far slower (the more data the slower it will be)
then simply read the files and process the read buffer!
That is my humble opinion and probably all I have to say for this post
![Smile :)](./images/smilies/icon_smile.gif)
Good Luck,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Why go to all this trouble (and double the disk space needed) when the stage supports automatically reading multiple files with a wildcard pattern?
![Confused :?](./images/smilies/icon_confused.gif)
On Windows? Disk space.pongal wrote:is there any limit for concatenating the data for a single text file?
Not anyway noted so far in this thread. You'll need jobs to get the various files into a common format before you can consider doing anything like processing them all at once.pongal wrote:if file extension is same and file format different(like difference in no of columns or file format is different(fixed width,delimited) , is there any way to process the files in a directory?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Ok I have more to say regarding Craig's note.
In some cases you have multiple files of the same scheme so my idea should work for that.
In case you have also multiple schemes for the files you have 2 options:
1. export/import stages and further along split acording to type aproach.
2. good naming convension or translation table stating which job to run foreach file.
But what exactly is the nature of the files (what do you need to do with them???)
let's say you get via FTP the file list, what next?
are they all the same in data dormat and should be processed the same way?
IHTH,
In some cases you have multiple files of the same scheme so my idea should work for that.
In case you have also multiple schemes for the files you have 2 options:
1. export/import stages and further along split acording to type aproach.
2. good naming convension or translation table stating which job to run foreach file.
But what exactly is the nature of the files (what do you need to do with them???)
let's say you get via FTP the file list, what next?
are they all the same in data dormat and should be processed the same way?
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Another option if the names are consistent - you can use a fileset to pull them in. You may have 100 different files with a .dat extension, containing multiple layouts, but the fileset only grabs those files you have specified.
The fileset is basically a list of files to process. Here's an example:
Otherwise, the file-naming conventions Roy mentioned should do the trick.
HTH.
Brad.
The fileset is basically a list of files to process. Here's an example:
Code: Select all
--Orchestrate File Set v1
--LFile
edwdev1:/u001/data/ddatd/dly_source1.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source2.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source3.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source4.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source5.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source6.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source7.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source8.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source9.ebc
HTH.
Brad.