Processing files in a directory

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

sashah
Participant
Posts: 37
Joined: Thu May 10, 2007 3:02 pm

Processing files in a directory

Post by sashah »

What would be the best way to persist data from multiple files residing in a directory to a Oracle Database. All the files have to same structure. I tried using folder stage but was not successful.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Welcome! :D

Typically, concatenation then process the concatenated file. Or a looping Sequence to read the filenames and repeat a processing job once per filename.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In the Sequential File stage specify a Filter command. The filter command to use is cat file1 file2 file3 file4... (or you could use wildcards if your file names are amenable). The Sequential File stage then reads the output of the cat command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

or you could use wildcards if your file names are amenable)
Ray, I recently found that I couldn't use wildcards in the filter command. Craig also found the same issue. Do you know that it can be done? Is there some specific syntax to get it to work? Or is it just not working in our versions of DS. 7.5.1
Regards,

Nick.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Perhaps it's an O/S thing? Mine is H-PUX 11.11 a.k.a 11i...
-craig

"You can never have too many knives" -- Logan Nine Fingers
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

Same here HPUX B11.11
Regards,

Nick.
sashah
Participant
Posts: 37
Joined: Thu May 10, 2007 3:02 pm

Concatenation of files

Post by sashah »

Thank you for the reply. How would I go about doing the concatenation of file from within Datastage.
chulett wrote:Welcome! :D

Typically, concatenation then process the concatenated file. Or a looping Sequence to read the filenames and repeat a processing job once per filename. ...
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Craig and Ray have already replied to your quest. Use the cat command in the before job subroutine 'ExecSH' to concatenate all the files you want to concatenate and redirect it to a different file. Use this file inside you job.

Code: Select all

cat file1 file2 file3 > fileAll
where file1, file2, file3 and fileAll should be fully qualified.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The easiest way would be to issue the wildcard 'cat' before-job using ExecSH to a fixed filename. Check under the Job Properties tab if you're not sure where that lives. Or try doing it in the Filter option of the Sequential file stage as others have mentioned.

Or you could build a 'batch' job that did the same using DSExecute and run it before the processing job, but I'd suggest one of the previous methods unless you've coded batch jobs before.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

DSguru2B wrote:where file1, file2, file3 and fileAll should be fully qualified.
Or you can 'cd' and then cat in two steps:

Code: Select all

cd #P_FILE_DIR# && cat *.csv > cat_file.csv
This also shows that you can use your job parameters in the commands as well. You are using parameters for things like this, yes? :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
sashah
Participant
Posts: 37
Joined: Thu May 10, 2007 3:02 pm

Concatenation of Files

Post by sashah »

Thank you for your reply. I will try it.
chulett wrote:The easiest way would be to issue the wildcard 'cat' before-job using ExecSH to a fixed filename. Check under the Job Properties tab if you're not sure where that lives. Or try doing it in the Filter option of the Sequential file stage as others have mentioned.

Or you could build a 'batch' job that did the same using DSExecute and run it before the processing job, but I'd suggest one of the previous methods unless you've coded batch jobs before.
sashah
Participant
Posts: 37
Joined: Thu May 10, 2007 3:02 pm

Execution Of DataStage Job

Post by sashah »

Is there a way to execute a datastage job from a shell script. I have a unix script which reads a directory from files and would like to execute the datastage job for each file.

Thanks for your help
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

A 'better' answer would be to build a Sequence job using the new Start Loop, End Loop and UserVariables stages to get the list of files and run the processing job iteratively, passing it a new filename each time.

What version of DataStage do you have, one of the 7.5.x releases with the stages I mentioned?

Or if you are more comfortable with a shell script, search the forum for the dsjob command, which is how you launch a job from the command line. It is documented in the Server Job Developer's Guide pdf in the Command Line Interface section near the end of the guide.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sashah
Participant
Posts: 37
Joined: Thu May 10, 2007 3:02 pm

Post by sashah »

Thank you. I will try with the Sequence job.
chulett wrote:A 'better' answer would be to build a Sequence job using the new Start Loop, End Loop and UserVariables stages to get the list of files and run the processing job iteratively, passing it a new filename each time.

What version of DataStage do you have, one of the 7.5.x releases with the stages I mentioned?

Or if you are more comfortable with a shell script, search the forum for the dsjob command, which is how you launch a job from the command line. It is documented in the Server Job Developer's Guide pdf in the Command Line Interface section near the end of the guide.
sashah
Participant
Posts: 37
Joined: Thu May 10, 2007 3:02 pm

Post by sashah »

Is there anywhere that I can see an example of using the start and End Loop and User variables stages

Thank you
sashah wrote:Thank you. I will try with the Sequence job.
chulett wrote:A 'better' answer would be to build a Sequence job using the new Start Loop, End Loop and UserVariables stages to get the list of files and run the processing job iteratively, passing it a new filename each time.

What version of DataStage do you have, one of the 7.5.x releases with the stages I mentioned?

Or if you are more comfortable with a shell script, search the forum for the dsjob command, which is how you launch a job from the command line. It is documented in the Server Job Developer's Guide pdf in the Command Line Interface section near the end of the guide.
Post Reply