Processing files in a directory
Moderators: chulett, rschirm, roy
Processing files in a directory
What would be the best way to persist data from multiple files residing in a directory to a Oracle Database. All the files have to same structure. I tried using folder stage but was not successful.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
In the Sequential File stage specify a Filter command. The filter command to use is cat file1 file2 file3 file4... (or you could use wildcards if your file names are amenable). The Sequential File stage then reads the output of the cat command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray, I recently found that I couldn't use wildcards in the filter command. Craig also found the same issue. Do you know that it can be done? Is there some specific syntax to get it to work? Or is it just not working in our versions of DS. 7.5.1or you could use wildcards if your file names are amenable)
Regards,
Nick.
Nick.
Concatenation of files
Thank you for the reply. How would I go about doing the concatenation of file from within Datastage.
chulett wrote:Welcome! :D
Typically, concatenation then process the concatenated file. Or a looping Sequence to read the filenames and repeat a processing job once per filename. ...
Craig and Ray have already replied to your quest. Use the cat command in the before job subroutine 'ExecSH' to concatenate all the files you want to concatenate and redirect it to a different file. Use this file inside you job.
where file1, file2, file3 and fileAll should be fully qualified.
Code: Select all
cat file1 file2 file3 > fileAll
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
The easiest way would be to issue the wildcard 'cat' before-job using ExecSH to a fixed filename. Check under the Job Properties tab if you're not sure where that lives. Or try doing it in the Filter option of the Sequential file stage as others have mentioned.
Or you could build a 'batch' job that did the same using DSExecute and run it before the processing job, but I'd suggest one of the previous methods unless you've coded batch jobs before.
Or you could build a 'batch' job that did the same using DSExecute and run it before the processing job, but I'd suggest one of the previous methods unless you've coded batch jobs before.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Or you can 'cd' and then cat in two steps:DSguru2B wrote:where file1, file2, file3 and fileAll should be fully qualified.
Code: Select all
cd #P_FILE_DIR# && cat *.csv > cat_file.csv
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Concatenation of Files
Thank you for your reply. I will try it.
chulett wrote:The easiest way would be to issue the wildcard 'cat' before-job using ExecSH to a fixed filename. Check under the Job Properties tab if you're not sure where that lives. Or try doing it in the Filter option of the Sequential file stage as others have mentioned.
Or you could build a 'batch' job that did the same using DSExecute and run it before the processing job, but I'd suggest one of the previous methods unless you've coded batch jobs before.
Execution Of DataStage Job
Is there a way to execute a datastage job from a shell script. I have a unix script which reads a directory from files and would like to execute the datastage job for each file.
Thanks for your help
Thanks for your help
A 'better' answer would be to build a Sequence job using the new Start Loop, End Loop and UserVariables stages to get the list of files and run the processing job iteratively, passing it a new filename each time.
What version of DataStage do you have, one of the 7.5.x releases with the stages I mentioned?
Or if you are more comfortable with a shell script, search the forum for the dsjob command, which is how you launch a job from the command line. It is documented in the Server Job Developer's Guide pdf in the Command Line Interface section near the end of the guide.
What version of DataStage do you have, one of the 7.5.x releases with the stages I mentioned?
Or if you are more comfortable with a shell script, search the forum for the dsjob command, which is how you launch a job from the command line. It is documented in the Server Job Developer's Guide pdf in the Command Line Interface section near the end of the guide.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Thank you. I will try with the Sequence job.
chulett wrote:A 'better' answer would be to build a Sequence job using the new Start Loop, End Loop and UserVariables stages to get the list of files and run the processing job iteratively, passing it a new filename each time.
What version of DataStage do you have, one of the 7.5.x releases with the stages I mentioned?
Or if you are more comfortable with a shell script, search the forum for the dsjob command, which is how you launch a job from the command line. It is documented in the Server Job Developer's Guide pdf in the Command Line Interface section near the end of the guide.
Is there anywhere that I can see an example of using the start and End Loop and User variables stages
Thank you
Thank you
sashah wrote:Thank you. I will try with the Sequence job.
chulett wrote:A 'better' answer would be to build a Sequence job using the new Start Loop, End Loop and UserVariables stages to get the list of files and run the processing job iteratively, passing it a new filename each time.
What version of DataStage do you have, one of the 7.5.x releases with the stages I mentioned?
Or if you are more comfortable with a shell script, search the forum for the dsjob command, which is how you launch a job from the command line. It is documented in the Server Job Developer's Guide pdf in the Command Line Interface section near the end of the guide.