Processing files in a directory

sashah · Post by **sashah** » Thu May 10, 2007 3:31 pm

What would be the best way to persist data from multiple files residing in a directory to a Oracle Database. All the files have to same structure. I tried using folder stage but was not successful.

chulett · Post by **chulett** » Thu May 10, 2007 3:42 pm

Welcome! :D

Typically, concatenation then process the concatenated file. Or a looping Sequence to read the filenames and repeat a processing job once per filename.

ray.wurlod · Post by **ray.wurlod** » Thu May 10, 2007 4:24 pm

In the Sequential File stage specify a Filter command. The filter command to use is cat file1 file2 file3 file4... (or you could use wildcards if your file names are amenable). The Sequential File stage then reads the output of the cat command.

nick.bond · Post by **nick.bond** » Thu May 10, 2007 4:38 pm

or you could use wildcards if your file names are amenable)

Ray, I recently found that I couldn't use wildcards in the filter command. Craig also found the same issue. Do you know that it can be done? Is there some specific syntax to get it to work? Or is it just not working in our versions of DS. 7.5.1

chulett · Post by **chulett** » Thu May 10, 2007 10:07 pm

Perhaps it's an O/S thing? Mine is H-PUX 11.11 a.k.a 11i...

nick.bond · Post by **nick.bond** » Thu May 10, 2007 11:29 pm

Same here HPUX B11.11

sashah · Post by **sashah** » Fri May 11, 2007 7:25 am

Thank you for the reply. How would I go about doing the concatenation of file from within Datastage.

chulett wrote:Welcome! :D

Typically, concatenation then process the concatenated file. Or a looping Sequence to read the filenames and repeat a processing job once per filename. ...

DSguru2B · Post by **DSguru2B** » Fri May 11, 2007 7:28 am

Craig and Ray have already replied to your quest. Use the cat command in the before job subroutine 'ExecSH' to concatenate all the files you want to concatenate and redirect it to a different file. Use this file inside you job.

Code: Select all

cat file1 file2 file3 > fileAll

where file1, file2, file3 and fileAll should be fully qualified.

chulett · Post by **chulett** » Fri May 11, 2007 7:30 am

The easiest way would be to issue the wildcard 'cat' before-job using ExecSH to a fixed filename. Check under the Job Properties tab if you're not sure where that lives. Or try doing it in the Filter option of the Sequential file stage as others have mentioned.

Or you could build a 'batch' job that did the same using DSExecute and run it before the processing job, but I'd suggest one of the previous methods unless you've coded batch jobs before.

chulett · Post by **chulett** » Fri May 11, 2007 7:33 am

DSguru2B wrote:where file1, file2, file3 and fileAll should be fully qualified.

Or you can 'cd' and then cat in two steps:

Code: Select all

cd #P_FILE_DIR# && cat *.csv > cat_file.csv

This also shows that you can use your job parameters in the commands as well. You are using parameters for things like this, yes?

sashah · Post by **sashah** » Fri May 11, 2007 7:37 am

Thank you for your reply. I will try it.

chulett wrote:The easiest way would be to issue the wildcard 'cat' before-job using ExecSH to a fixed filename. Check under the Job Properties tab if you're not sure where that lives. Or try doing it in the Filter option of the Sequential file stage as others have mentioned.

Or you could build a 'batch' job that did the same using DSExecute and run it before the processing job, but I'd suggest one of the previous methods unless you've coded batch jobs before.

sashah · Post by **sashah** » Fri May 11, 2007 10:44 am

Is there a way to execute a datastage job from a shell script. I have a unix script which reads a directory from files and would like to execute the datastage job for each file.

Thanks for your help

chulett · Post by **chulett** » Fri May 11, 2007 11:38 am

A 'better' answer would be to build a Sequence job using the new Start Loop, End Loop and UserVariables stages to get the list of files and run the processing job iteratively, passing it a new filename each time.

What version of DataStage do you have, one of the 7.5.x releases with the stages I mentioned?

Or if you are more comfortable with a shell script, search the forum for the dsjob command, which is how you launch a job from the command line. It is documented in the Server Job Developer's Guide pdf in the Command Line Interface section near the end of the guide.

sashah · Post by **sashah** » Fri May 11, 2007 11:46 am

Thank you. I will try with the Sequence job.

chulett wrote:A 'better' answer would be to build a Sequence job using the new Start Loop, End Loop and UserVariables stages to get the list of files and run the processing job iteratively, passing it a new filename each time.

What version of DataStage do you have, one of the 7.5.x releases with the stages I mentioned?

Or if you are more comfortable with a shell script, search the forum for the dsjob command, which is how you launch a job from the command line. It is documented in the Server Job Developer's Guide pdf in the Command Line Interface section near the end of the guide.

sashah · Post by **sashah** » Fri May 11, 2007 12:38 pm

Is there anywhere that I can see an example of using the start and End Loop and User variables stages

Thank you

sashah wrote:Thank you. I will try with the Sequence job.

chulett wrote:A 'better' answer would be to build a Sequence job using the new Start Loop, End Loop and UserVariables stages to get the list of files and run the processing job iteratively, passing it a new filename each time.

What version of DataStage do you have, one of the 7.5.x releases with the stages I mentioned?

Or if you are more comfortable with a shell script, search the forum for the dsjob command, which is how you launch a job from the command line. It is documented in the Server Job Developer's Guide pdf in the Command Line Interface section near the end of the guide.

DSXchange

Processing files in a directory

Processing files in a directory

Concatenation of files

Concatenation of Files

Execution Of DataStage Job