multiple text files as input

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pongal
Participant
Posts: 77
Joined: Thu Mar 04, 2004 4:46 am

multiple text files as input

Post by pongal »

For one of our integration points the inputs will be a set of files sitting on a Unix server. We would like the DataStage process to input and process all of the files contained in the directory. Can anybody please advise which stage should we need to use to retrieve and process multiple text files as a batch.
Thanks in advance....
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
One way is to use the file pattern as your source rather then a specific file path.
search for it in the parallel job developers guide ( 5-28 ).

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
pongal
Participant
Posts: 77
Joined: Thu Mar 04, 2004 4:46 am

Post by pongal »

Hi roy,
i am not understanding file pattern here.
currently i don't have ds parallel job developer pdf .
can you please explain in detail.
Thanks
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
if you specify in your sequential file stage your read method as file pattern and use wild cards in your file path (i.e: #path#/*) you will read all files in the specified path.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

There is another way of processing these files, which is execute the cat command or write a shell script to concatenate all the files into one and use the output file in the job. In order to implement this method, the no of files in the directory are static and metadata of these files must be same.

HTWH.

Regards
Saravanan
pongal
Participant
Posts: 77
Joined: Thu Mar 04, 2004 4:46 am

Post by pongal »

Hi Sarvanan,
is this command works out for my solution if i put in before job subroutine as
ExecDOS copy /apps/Ascemtial/Projects/IDD/*.txt /a /apps/Ascential/Projects/IDD/ADTest.txt /a
or shall i write a shell script and execute in before job subroutine
ExecSH merge.sh
which one of the above commands is good in concatenating in correct data format of all input files without any descrepencies.
and my second question: is there any limit for concatenating the data for a single text file?

third question:- if file extension is same and file format different(like difference in no of columns or file format is different(fixed width,delimited) , is there any way to process the files in a directory ?
i think i am crossing my fingers...... :roll:
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
performing a type #PATH1#\*.* > #PATH2#\Myfile
will be far slower (the more data the slower it will be)
then simply read the files and process the read buffer!

That is my humble opinion and probably all I have to say for this post :)

Good Luck,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Why go to all this trouble (and double the disk space needed) when the stage supports automatically reading multiple files with a wildcard pattern? :?
pongal wrote:is there any limit for concatenating the data for a single text file?
On Windows? Disk space.
pongal wrote:if file extension is same and file format different(like difference in no of columns or file format is different(fixed width,delimited) , is there any way to process the files in a directory?
Not anyway noted so far in this thread. You'll need jobs to get the various files into a common format before you can consider doing anything like processing them all at once.
-craig

"You can never have too many knives" -- Logan Nine Fingers
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Ok I have more to say regarding Craig's note.

In some cases you have multiple files of the same scheme so my idea should work for that.

In case you have also multiple schemes for the files you have 2 options:

1. export/import stages and further along split acording to type aproach.
2. good naming convension or translation table stating which job to run foreach file.

But what exactly is the nature of the files (what do you need to do with them???)

let's say you get via FTP the file list, what next?
are they all the same in data dormat and should be processed the same way?

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Another option if the names are consistent - you can use a fileset to pull them in. You may have 100 different files with a .dat extension, containing multiple layouts, but the fileset only grabs those files you have specified.

The fileset is basically a list of files to process. Here's an example:

Code: Select all

--Orchestrate File Set v1
--LFile
edwdev1:/u001/data/ddatd/dly_source1.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source2.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source3.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source4.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source5.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source6.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source7.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source8.ebc
--LFile
edwdev1:/u001/data/ddatd/dly_source9.ebc
Otherwise, the file-naming conventions Roy mentioned should do the trick.

HTH.
Brad.
Post Reply