reading sequential files

sweetleaf · Post by **sweetleaf** » Tue Jan 28, 2003 3:12 pm

Hi everyone,

A quick question: If I have a job that I want to read the contents of all the csv files (all having the same format) in a given directory into a database table, how can I do this? Can DataStage except something like this "/export/home/*" in the file name prompt?

Or can I get it so that where it prompts for file name (Sequential File Stage), I can enter file names like "Test" & "*" - so that DataStage will know to pick up and process anything starting with "Test".

ray.wurlod · Post by **ray.wurlod** » Tue Jan 28, 2003 4:18 pm

The FileName field of a Sequential File stage does not support wildcard constructs (or any form of regular expression).
You need to build a hierarchy of jobs, one to process the filenames, which sets a filename parameter in the other job, whose task is to process a single file. This can be achieved through Job Control (tab on job properties) or with a sequence: in both cases you will need to create a loop to run the child job N times, where N is the number of files.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

sweetleaf · Post by **sweetleaf** » Wed Jan 29, 2003 8:07 am

Hi Ray,

The only thing is that the file names and the number of files will never be known in advance.
I've actually come across something new to me:the Folder Stage. In the folder properties I've defined the output(column names, folder path that will contain my files etc..) I've created a link b/w a Folder and a Transform Stage. The problem is that when i compile, the error I'm getting is "Stage doesn't have primary input".
How do I specify a inputs in the Folder Stage?

Thanks again!

chulett · Post by **chulett** » Wed Jan 29, 2003 9:51 am

You don't need to know the number of files or their names. Ray is suggesting that you create some Job Control that, in essense, does a 'dir' or an 'ls', captures the filenames and their number and then loop thru the list. On each iteration, pull a filename off the list and pass it in as a parameter.

As to the Folder stage, have you looked at the online help? Ascential's help is generally very good. You can also read about it in the Server Job Developer's Guide, if you don't have hardcopy there is a pdf that is installed with the client software.

-craig

suresh · Post by **suresh** » Wed Jan 29, 2003 10:00 am

When you are using a Folder stage the primary input to that stage is the Folder path(Directory path where your files reside).You can specify the *.* in the filename input to read all the files from that folder.The output of the Folder stage can contain only 2 columns, one is Filename and the other the Record contents which you can pass it to the Transformer stage.

sweetleaf · Post by **sweetleaf** » Wed Jan 29, 2003 11:11 am

Hi guys,

Thanks for your suggestions as you can tell i'm a DataStage newbie.
Now that I've found the Job Control tab, I don't know what type into it to tell DataStage which directory to look in, to capture file names and file count, then pass the filename as a param to another job.

I don't have the manuals - if anyone is willing and knows how I can achieve this it would be greatly appreciated.

sweetleaf · Post by **sweetleaf** » Wed Jan 29, 2003 11:21 am

To reply to Suresh,

Hi there, as you can see i'm trying a few methods for getting .csv data into my database..
I've managed to set up a folder stage which does exactly what you'd mentioned: it loads 2 columns into my table, FileName and File contents. The contents are quite huge often exceeding 3000 characters per entry.
With the data in this state however, I still have to write a pretty hefty parsing program in plsql that will take each element within a delimeter and insert it into another table.
Is there any data massaging tool within DataStage to handle data pulled in by the folder stage so that the data can be preapred for inserting into a db?

Thanks