Multiple input files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
clarkf
Participant
Posts: 5
Joined: Tue May 27, 2003 3:12 pm
Location: USA

Multiple input files

Post by clarkf »

Hi,

I am new to Datastage and I am looking for some guidance. I need to create a job that will load from multiple directories (each source dumps their files to their own directory), and use Oracle Sqlldr to load the files. All are the same layout.
I have created a parameterized Oracle batch job that works fine. I need to know how to search for all subdirectories under the base directory, then look for all files in each of those subdirectories that are not still growing and pass them as a parameter to my job.

Clark
dtsipe
Participant
Posts: 8
Joined: Fri May 02, 2003 9:12 am
Location: Canada

Post by dtsipe »

I don't understand where is the problem. Do you have multiple input files for Sql*Loader or one step bebore ?
clarkf
Participant
Posts: 5
Joined: Tue May 27, 2003 3:12 pm
Location: USA

Post by clarkf »

quote:Originally posted by dtsipe
[br]I don't understand where is the problem. Do you have multiple input files for Sql*Loader or one step bebore ?


I have a directory ("/switches"), under this directory I have multiple sub directories one for each switch that dumps files ("71","225", etc). Each switch will dump files to it's directory every 15 minutes or so. I need to loop through all the subdirectories and if there is a file I need to use sqlldr to load the records into the database. There could be one or more files under each of the switch directories.
dtsipe
Participant
Posts: 8
Joined: Fri May 02, 2003 9:12 am
Location: Canada

Post by dtsipe »

Use DataStage scheduler in Director to submit batch that must loop through all the subdirectories. You can adjust time interval and frequency of execution on both Unix and NT platforms through.
Allow multiple instance feature becouse you may kick off next instance before previos has finished.
Regarding searching in OS directories I think the best way is to use
some OS script invoked from DataStage job. I don't know that OS you use so I can not be more specific.
Finally you will have 2 jobs : first job invokes OS script to get file name and second load this file with Loader.
Wrap both of them into batch that is executed through scheduler.

Regards.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

DataStage BASIC has the capability to search through subdirectories, so you can create a job control routine including such functionality. Essentially the technique is to open "/switches" as if it were a table (using the OpenPath statement), establish a Select List of filenames (as if they were record IDs), and process these one at a time. When done, either move them to another location or rename them or some such. Alternately, open each file with OpenSeq then use the Status statement to determine the date and time modified. Don't forget to use CloseSeq to close anything opened with OpenSeq.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Post Reply