multiple text files as input

pongal · Post by **pongal** » Sun Jul 10, 2005 6:10 am

For one of our integration points the inputs will be a set of files sitting on a Unix server. We would like the DataStage process to input and process all of the files contained in the directory. Can anybody please advise which stage should we need to use to retrieve and process multiple text files as a batch.
Thanks in advance....

roy · Post by **roy** » Sun Jul 10, 2005 6:19 am

Hi,
One way is to use the file pattern as your source rather then a specific file path.
search for it in the parallel job developers guide ( 5-28 ).

IHTH,

pongal · Post by **pongal** » Sun Jul 10, 2005 6:44 am

Hi roy,
i am not understanding file pattern here.
currently i don't have ds parallel job developer pdf .
can you please explain in detail.
Thanks

roy · Post by **roy** » Sun Jul 10, 2005 7:17 am

Hi,
if you specify in your sequential file stage your read method as file pattern and use wild cards in your file path (i.e: #path#/*) you will read all files in the specified path.

IHTH,

elavenil · Post by **elavenil** » Sun Jul 10, 2005 7:40 am

There is another way of processing these files, which is execute the cat command or write a shell script to concatenate all the files into one and use the output file in the job. In order to implement this method, the no of files in the directory are static and metadata of these files must be same.

HTWH.

Regards
Saravanan

pongal · Post by **pongal** » Sun Jul 10, 2005 8:22 am

Hi Sarvanan,
is this command works out for my solution if i put in before job subroutine as
ExecDOS copy /apps/Ascemtial/Projects/IDD/*.txt /a /apps/Ascential/Projects/IDD/ADTest.txt /a
or shall i write a shell script and execute in before job subroutine
ExecSH merge.sh
which one of the above commands is good in concatenating in correct data format of all input files without any descrepencies.
and my second question: is there any limit for concatenating the data for a single text file?

third question:- if file extension is same and file format different(like difference in no of columns or file format is different(fixed width,delimited) , is there any way to process the files in a directory ?
i think i am crossing my fingers......

roy · Post by **roy** » Sun Jul 10, 2005 8:39 am

Hi,
performing a type #PATH1#\*.* > #PATH2#\Myfile
will be far slower (the more data the slower it will be)
then simply read the files and process the read buffer!

That is my humble opinion and probably all I have to say for this post

Good Luck,

chulett · Post by **chulett** » Sun Jul 10, 2005 8:42 am

Why go to all this trouble (and double the disk space needed) when the stage supports automatically reading multiple files with a wildcard pattern?

pongal wrote:is there any limit for concatenating the data for a single text file?

On Windows? Disk space.

pongal wrote:if file extension is same and file format different(like difference in no of columns or file format is different(fixed width,delimited) , is there any way to process the files in a directory?

Not anyway noted so far in this thread. You'll need jobs to get the various files into a common format before you can consider doing anything like processing them all at once.

roy · Post by **roy** » Sun Jul 10, 2005 8:49 am

Ok I have more to say regarding Craig's note.

In some cases you have multiple files of the same scheme so my idea should work for that.

In case you have also multiple schemes for the files you have 2 options:

1. export/import stages and further along split acording to type aproach.
2. good naming convension or translation table stating which job to run foreach file.

But what exactly is the nature of the files (what do you need to do with them???)

let's say you get via FTP the file list, what next?
are they all the same in data dormat and should be processed the same way?

IHTH,

bcarlson · Post by **bcarlson** » Mon Jul 11, 2005 11:00 am

Another option if the names are consistent - you can use a fileset to pull them in. You may have 100 different files with a .dat extension, containing multiple layouts, but the fileset only grabs those files you have specified.

The fileset is basically a list of files to process. Here's an example:

Code: Select all

--Orchestrate File Set v1
--LFile
edwdev1:/u001/data/ddatd/dly_source1.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source2.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source3.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source4.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source5.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source6.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source7.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source8.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source9.ebc

Otherwise, the file-naming conventions Roy mentioned should do the trick.

HTH.
Brad.