reading sequential files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sweetleaf
Participant
Posts: 30
Joined: Fri Jan 24, 2003 3:28 pm
Location: Canada

reading sequential files

Post by sweetleaf »

Hi everyone,

A quick question: If I have a job that I want to read the contents of all the csv files (all having the same format) in a given directory into a database table, how can I do this? Can DataStage except something like this "/export/home/*" in the file name prompt?

Or can I get it so that where it prompts for file name (Sequential File Stage), I can enter file names like "Test" & "*" - so that DataStage will know to pick up and process anything starting with "Test".
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The FileName field of a Sequential File stage does not support wildcard constructs (or any form of regular expression).
You need to build a hierarchy of jobs, one to process the filenames, which sets a filename parameter in the other job, whose task is to process a single file. This can be achieved through Job Control (tab on job properties) or with a sequence: in both cases you will need to create a loop to run the child job N times, where N is the number of files.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
sweetleaf
Participant
Posts: 30
Joined: Fri Jan 24, 2003 3:28 pm
Location: Canada

Post by sweetleaf »

Hi Ray,

The only thing is that the file names and the number of files will never be known in advance.
I've actually come across something new to me:the Folder Stage. In the folder properties I've defined the output(column names, folder path that will contain my files etc..) I've created a link b/w a Folder and a Transform Stage. The problem is that when i compile, the error I'm getting is "Stage doesn't have primary input".
How do I specify a inputs in the Folder Stage?

Thanks again!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You don't need to know the number of files or their names. Ray is suggesting that you create some Job Control that, in essense, does a 'dir' or an 'ls', captures the filenames and their number and then loop thru the list. On each iteration, pull a filename off the list and pass it in as a parameter.

As to the Folder stage, have you looked at the online help? Ascential's help is generally very good. You can also read about it in the Server Job Developer's Guide, if you don't have hardcopy there is a pdf that is installed with the client software.

-craig
suresh
Participant
Posts: 8
Joined: Fri Nov 22, 2002 6:58 am

Post by suresh »

When you are using a Folder stage the primary input to that stage is the Folder path(Directory path where your files reside).You can specify the *.* in the filename input to read all the files from that folder.The output of the Folder stage can contain only 2 columns, one is Filename and the other the Record contents which you can pass it to the Transformer stage.
sweetleaf
Participant
Posts: 30
Joined: Fri Jan 24, 2003 3:28 pm
Location: Canada

Post by sweetleaf »

Hi guys,

Thanks for your suggestions as you can tell i'm a DataStage newbie.
Now that I've found the Job Control tab, I don't know what type into it to tell DataStage which directory to look in, to capture file names and file count, then pass the filename as a param to another job.

I don't have the manuals - if anyone is willing and knows how I can achieve this it would be greatly appreciated.
sweetleaf
Participant
Posts: 30
Joined: Fri Jan 24, 2003 3:28 pm
Location: Canada

Post by sweetleaf »

To reply to Suresh,

Hi there, as you can see i'm trying a few methods for getting .csv data into my database..
I've managed to set up a folder stage which does exactly what you'd mentioned: it loads 2 columns into my table, FileName and File contents. The contents are quite huge often exceeding 3000 characters per entry.
With the data in this state however, I still have to write a pretty hefty parsing program in plsql that will take each element within a delimeter and insert it into another table.
Is there any data massaging tool within DataStage to handle data pulled in by the folder stage so that the data can be preapred for inserting into a db?

Thanks
Post Reply