Page 1 of 1

multiple text files as input

Posted: Sun Jul 10, 2005 6:10 am
by pongal
For one of our integration points the inputs will be a set of files sitting on a Unix server. We would like the DataStage process to input and process all of the files contained in the directory. Can anybody please advise which stage should we need to use to retrieve and process multiple text files as a batch.
Thanks in advance....

Posted: Sun Jul 10, 2005 6:19 am
by roy
Hi,
One way is to use the file pattern as your source rather then a specific file path.
search for it in the parallel job developers guide ( 5-28 ).

IHTH,

Posted: Sun Jul 10, 2005 6:44 am
by pongal
Hi roy,
i am not understanding file pattern here.
currently i don't have ds parallel job developer pdf .
can you please explain in detail.
Thanks

Posted: Sun Jul 10, 2005 7:17 am
by roy
Hi,
if you specify in your sequential file stage your read method as file pattern and use wild cards in your file path (i.e: #path#/*) you will read all files in the specified path.

IHTH,

Posted: Sun Jul 10, 2005 7:40 am
by elavenil
There is another way of processing these files, which is execute the cat command or write a shell script to concatenate all the files into one and use the output file in the job. In order to implement this method, the no of files in the directory are static and metadata of these files must be same.

HTWH.

Regards
Saravanan

Posted: Sun Jul 10, 2005 8:22 am
by pongal
Hi Sarvanan,
is this command works out for my solution if i put in before job subroutine as
ExecDOS copy /apps/Ascemtial/Projects/IDD/*.txt /a /apps/Ascential/Projects/IDD/ADTest.txt /a
or shall i write a shell script and execute in before job subroutine
ExecSH merge.sh
which one of the above commands is good in concatenating in correct data format of all input files without any descrepencies.
and my second question: is there any limit for concatenating the data for a single text file?

third question:- if file extension is same and file format different(like difference in no of columns or file format is different(fixed width,delimited) , is there any way to process the files in a directory ?
i think i am crossing my fingers...... :roll:

Posted: Sun Jul 10, 2005 8:39 am
by roy
Hi,
performing a type #PATH1#\*.* > #PATH2#\Myfile
will be far slower (the more data the slower it will be)
then simply read the files and process the read buffer!

That is my humble opinion and probably all I have to say for this post :)

Good Luck,

Posted: Sun Jul 10, 2005 8:42 am
by chulett
Why go to all this trouble (and double the disk space needed) when the stage supports automatically reading multiple files with a wildcard pattern? :?
pongal wrote:is there any limit for concatenating the data for a single text file?
On Windows? Disk space.
pongal wrote:if file extension is same and file format different(like difference in no of columns or file format is different(fixed width,delimited) , is there any way to process the files in a directory?
Not anyway noted so far in this thread. You'll need jobs to get the various files into a common format before you can consider doing anything like processing them all at once.

Posted: Sun Jul 10, 2005 8:49 am
by roy
Ok I have more to say regarding Craig's note.

In some cases you have multiple files of the same scheme so my idea should work for that.

In case you have also multiple schemes for the files you have 2 options:

1. export/import stages and further along split acording to type aproach.
2. good naming convension or translation table stating which job to run foreach file.

But what exactly is the nature of the files (what do you need to do with them???)

let's say you get via FTP the file list, what next?
are they all the same in data dormat and should be processed the same way?

IHTH,

Posted: Mon Jul 11, 2005 11:00 am
by bcarlson
Another option if the names are consistent - you can use a fileset to pull them in. You may have 100 different files with a .dat extension, containing multiple layouts, but the fileset only grabs those files you have specified.

The fileset is basically a list of files to process. Here's an example:

Code: Select all

--Orchestrate File Set v1
--LFile
edwdev1:/u001/data/ddatd/dly_source1.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source2.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source3.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source4.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source5.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source6.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source7.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source8.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source9.ebc
Otherwise, the file-naming conventions Roy mentioned should do the trick.

HTH.
Brad.