Multiple input files

clarkf · Post by **clarkf** » Tue May 27, 2003 3:23 pm

Hi,

I am new to Datastage and I am looking for some guidance. I need to create a job that will load from multiple directories (each source dumps their files to their own directory), and use Oracle Sqlldr to load the files. All are the same layout.
I have created a parameterized Oracle batch job that works fine. I need to know how to search for all subdirectories under the base directory, then look for all files in each of those subdirectories that are not still growing and pass them as a parameter to my job.

Clark

dtsipe · Post by **dtsipe** » Tue May 27, 2003 3:38 pm

I don't understand where is the problem. Do you have multiple input files for Sql*Loader or one step bebore ?

clarkf · Post by **clarkf** » Tue May 27, 2003 3:44 pm

quote:Originally posted by dtsipe
[br]I don't understand where is the problem. Do you have multiple input files for Sql*Loader or one step bebore ?

I have a directory ("/switches"), under this directory I have multiple sub directories one for each switch that dumps files ("71","225", etc). Each switch will dump files to it's directory every 15 minutes or so. I need to loop through all the subdirectories and if there is a file I need to use sqlldr to load the records into the database. There could be one or more files under each of the switch directories.

dtsipe · Post by **dtsipe** » Tue May 27, 2003 4:01 pm

Use DataStage scheduler in Director to submit batch that must loop through all the subdirectories. You can adjust time interval and frequency of execution on both Unix and NT platforms through.
Allow multiple instance feature becouse you may kick off next instance before previos has finished.
Regarding searching in OS directories I think the best way is to use
some OS script invoked from DataStage job. I don't know that OS you use so I can not be more specific.
Finally you will have 2 jobs : first job invokes OS script to get file name and second load this file with Loader.
Wrap both of them into batch that is executed through scheduler.

Regards.

ray.wurlod · Post by **ray.wurlod** » Wed May 28, 2003 3:59 am

DataStage BASIC has the capability to search through subdirectories, so you can create a job control routine including such functionality. Essentially the technique is to open "/switches" as if it were a table (using the OpenPath statement), establish a Select List of filenames (as if they were record IDs), and process these one at a time. When done, either move them to another location or rename them or some such. Alternately, open each file with OpenSeq then use the Status statement to determine the date and time modified. Don't forget to use CloseSeq to close anything opened with OpenSeq.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518