Using Folder Stage when there are multiple source files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
tombastian
Premium Member
Premium Member
Posts: 41
Joined: Fri Jun 04, 2004 5:52 am
Location: Bangalore

Using Folder Stage when there are multiple source files

Post by tombastian »

Hi,
I would like to know the usage of Folder Stage in DataStage. I have designed the Job as follows : FolderStage (have multiple files in the source directory) > SequentialFileStage > TransformerStage> ORAOCI(TargetTable). In the Folderstage -> outputs-> columns tab, I have given one column definition for the filename (EmpData) the properties are :Group=No, Key=yes, Sql Type=unknown. In the next seqential file stage what should i give for file name in both Input and Output tabs. Can I give both the values as follows <Folder Path Name Vale>\#EmpData. something like this? D:\Ds\Empdata\#EmpData. In the output tab of my sequential file stage, I have given the complete file column definitions as we usually give it for sequential files. The job is compiling but getting the Required Column Missing Error. Pls help .

Thanks,
Tom
sandhya
Participant
Posts: 6
Joined: Fri Oct 17, 2003 4:48 am

Post by sandhya »

Folder stage contains two columns. File Name and Record. Record takes the contents of the file. I think in your job Folder stage doesn't contain Record column

regards,
sandhya
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard! :D

The folder stage has a maximum of two "columns". The first gives you the file name, the second contains the file contents.

You should not be trying to put these in to a single sequential file. Lose your first Sequential File stage. Pass directly through to the Transformer stage, in which you can decompose the file's contents into appropriate columns for loading into Oracle.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tombastian
Premium Member
Premium Member
Posts: 41
Joined: Fri Jun 04, 2004 5:52 am
Location: Bangalore

Post by tombastian »

Hi Sandhya & Ray,
Thanks for you reply. As sandhya had mentioned I had not given the column definition for Rows in Folder Stage. Now I've given this and when I select the view data button on SequentialFile stage> Input Tab, it shows the file names and the data against it in the grid. But when I select the view data button on output tab in the same stage(sequetial File stage) it throws the error "Required Column Missing ". While running the job too, it throws the same error. I have given the same Filename Parameter on both Input & Output tab in Sequential File Satge. (Eg: D:\test\#DataFile. )

Ray, I'm using multiple Flatfiles and have designed the job as follows FolderStage>SequntialFileStage>TransformerStage>ORAOCI for inserting & SequetialFileStage for Rejects. As per u r message, I tried it without the sequetial file Stage Still it is failing. I've not used folder stage at all. Will U Pls help me to solve this problem.

Thanking You,
Tom.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1. Can you document what you want to achieve (not what you want to do with DataStage; what you want to achieve)?

2. Job parameter references need a trailing "#" character; your samples lack this. That is, you specify D:\test\#DataFile when you should be specifying D:\test\#DataFile# to use a job parameter value.

I expect, from the message you describe, that either the Format or the Columns does not match on both the Inputs and Outputs tab of the Sequential File stage. More precisely, I suspect that you only have the file name defined on th Outputs link. If the links are accessing the same file, then clearly they must match; two columns on each and the same file format.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tombastian
Premium Member
Premium Member
Posts: 41
Joined: Fri Jun 04, 2004 5:52 am
Location: Bangalore

Post by tombastian »

Hi Ray,
I want to test the working of Folder stage with flat files. For this purpose I've created a Job with the following stages. FolderStage>Seqential FileStage> Trasformation>OraOciStage. As you have suggested in one of your replies in the Forum, I can use OS commands to append the data to a single file and load it using Sequential File. But I would like to test my job with Folder Stage and understand how it works. I am stuck at the Mapping between Folder Stage and Seqential Stage. In folder stage I've given 2 columns One for FileName (Eg: DataFile) and other for Records in the Files (Eg:FileRecrds). I'm confused with the mappings in Sequential Stage. On the Input of Sequential Stage, I have the columns from Folderstage & on Output tab I have four columns which maps to the data in the files. For the filenames in SequentialStage, I have given it as follows <Folder Path Name Value>\#DataFile(DataFile is the Coulmn name representing the File Names in FolderStage). I hope this is the way folderstage can be used to load files. My job is failing and throws the error "Required Column Missing". Expecting you valuable reply on the this problem.

Thanks & Regards,
Tom.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You are trying to load the entire folder contents into a single text file.
Is this text file in the folder in question? (Eek!!)

Try FolderStage > TransformerStage > OraOciStage as a preferred design.

This is what I mean earlier when I suggested that you "lose the Sequential File stage".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tombastian
Premium Member
Premium Member
Posts: 41
Joined: Fri Jun 04, 2004 5:52 am
Location: Bangalore

Post by tombastian »

Thank You Ray for your reply.

Regards,
Tom.
john@dtsisoftware.com
Participant
Posts: 6
Joined: Mon Dec 27, 2004 3:32 pm

Handling multiple flat files in sequence

Post by john@dtsisoftware.com »

[quote="ray.wurlod"]You are trying to load the entire folder contents into a single text file.
Is this text file in the folder in question? (Eek!!)

Try [b]FolderStage > TransformerStage > OraOciStage [/b]as a preferred design.

This is what I mean earlier when I suggested that you "lose the Sequential File stage".[/quote]

Greetings, Ray.
I have been reviewing your postings on this subject with great interest. This is my first DataStage project, and after reading all of the docs (some twice) and browsing through DSXchange for a day, I still am puzzled.

I am trying something similar as Tom in the original posting...I have multiple flat files in a MS directory structure, and need to be able to loop through them, transform them and write to an Oracle database. So far, I have been able to do this succesfully when there is only 1 file in the folder...when I have two files to read in sequence, my program will not progress to the second file.

I have tried to load all of the source files into one directory with the thought that I could remove 1 file at a time to a separate staging directory,and then process that single file, move it out, move in another file from the source directory to the staging directory, but had no luck moving one file at a time...all of the files would move instead of just one.

I tried to concatenate the files into one big file using the cat * command and process the one file, with no luck...these files are huge and it fails.

I looked for pre-packaged routines or functions to handle one file at a time with no luck

Any thoughts on where I might be missing something would be appreciated.

Greatest regards and I thank you kindly in advance for any response you might have.
John
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard! :D

The Folder stage is notorious (at least in version 7.1 and earlier) for not being able to handle large files in the folder.

Create a server job to process one sequential file into the target table. Make the name of the file a job parameter.
Create another server job. Open its properties window and choose the tab captioned Job Control. You will see a blank edit field. Here you can create a job control routine. Here are the basics of it. Error detection has been omitted for clarity; I would not do so in practice.

Code: Select all

DirPath = DSGetParamInfo(DSJ.ME, "DirPath", DSJ.PARAMVALUE)
OpenPath DirPath To Dir.fvar
On Error
   Call DSLogWarn("Can not open directory " : Quote(DirPath), "Job Control")
End
Then
   * Create "Select List" of file names in directory
   ClearSelect 10
   Select Dir.fvar To 10
   * For each file name attach the job, run the job, wait for it to finish,
   * and detach the job.
   Loop 
   While ReadNext FileName From 10
      hJob = DSAttachJob("TheRealJob", DSJ.ERRNONE)
      ErrCode = DSSetParam(hJob, "FileName", FileName)
      ErrCode = DSRunJob(hJob, DSJ.RUNNORMAL)
      ErrCode = DSWaitForJob(hJob)
      ErrCode = DSDetachJob(hJob)
   Repeat
End
Else
   Call DSLogWarn("Can not open directory " : Quote(DirPath), "Job Control")
End
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply