Multiple Excel files for input

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
morneh
Participant
Posts: 17
Joined: Wed Jan 28, 2004 8:09 am
Contact:

Multiple Excel files for input

Post by morneh »

I haven't tried it yet but I thought I'd post here for someone to give me a sanity check and tell me I'm not over-complicating things :D

What we're trying to achieve is loading multiple excel spreadsheets into a SQL server database. The formats are the same but the names are different.

I'm thinking of creating an Excel ODBC pointing to a file, something like Source.xls

Somehow counting how many files there are, then looping through each one of them. After the job runs, I rename the Source.xls file and move it to an archive folder, take the next file, rename it to Source.xls and then the loading-job executes. The loop continues until the data folder is empty.

Is this the best way of doing it and is everything I'm thinking of doing here actually possible ? Especially counting how many files there are in the folder... don't have any clue how I'm going to achieve this yet.

The reasoning behind the loop is that I won't know how many source files there'll be, and I really don't want to create hundreds of ODBC connections, one for every excel file.

Any help/comments will be appreciated.

Thanks
Morne
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Morne,

with DS Server you can write a job using a FOLDER stage, which loops through all files that match a mask (i.e. *.xls) and pass them on to the next stage. This gives you the names of the files that you want to work with. Since the Windows dsn points a to file, you could try pointing it at file called "Temporary.xls" and then copying each of the real .xls files over that file, then calling up the processing portion of your DS job using Temporary.xls and it's ODBC dsn. Once that job is finished you then copy the next source file into Temporary.xls and so on.

I haven't tried this in practice, since Windows might be doing some ODBC buffering preventing immediate overwrite & re-read of the same source; but I would be surprised if it doesn't work.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
I've seen things like this work, tough you might need a wait between access to the file and overwriting it with the next one.

if you intend using the folder stage:
1. be aware large files might fail the stage.
2. no directories are laowed in the directory you use the folder stage.

alternatives for using the folder stage include DS basic code invoking OS commands or basic code alone (all of wich are a search away from you :wink: ).

in fact having different table definitions for the excel files would only force you to have several loading jobs and a bit of logic and it will still work with the same 1 DSN logic.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Post Reply