Multiple source files single ETL
Moderators: chulett, rschirm, roy
Multiple source files single ETL
Hi all:
I have 4 files named as below..
filename_ABC.csv
filename_DEF.csv
filename_MNU.csv
filename_XYZ.csv
All four files are in same format/structure, but contains different business data. I have an ETL to load "filename_ABC.csv" into Oracle table.
My question is...can I use this single ETL to load all 4 files every month? All files will land in FTP server at same time.
Please through some light ...
thanks in advance
I have 4 files named as below..
filename_ABC.csv
filename_DEF.csv
filename_MNU.csv
filename_XYZ.csv
All four files are in same format/structure, but contains different business data. I have an ETL to load "filename_ABC.csv" into Oracle table.
My question is...can I use this single ETL to load all 4 files every month? All files will land in FTP server at same time.
Please through some light ...
thanks in advance
Thank you,
Anu
Anu
If the metadata of all the four files is identical and they flow through the same set of transformations and rules. Then yes you can use that one job to process all the four files.
Your can make your job multi-instance. Provide the source file names as a job parameter. And run those jobs as seperate threads at the same time.
Or
you could combine all of your sourcefiles into one file and then process that.
The combining can be done either via link collector or by executing the cat command in the before rouinte
So you have a couple of options.
CHeers
Your can make your job multi-instance. Provide the source file names as a job parameter. And run those jobs as seperate threads at the same time.
Or
you could combine all of your sourcefiles into one file and then process that.
The combining can be done either via link collector or by executing the cat command in the before rouinte
So you have a couple of options.
CHeers
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Re: Multiple source files single ETL
Hi
even i have a similar issue but question is how the datastage job gets the sourec file name if we parameterize the source filename
thanks
donny
even i have a similar issue but question is how the datastage job gets the sourec file name if we parameterize the source filename
thanks
donny
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Welcome aboard. :D
Your question suggests some unfamiliarity with job parameters. If you define a job parameter, then you can use a reference to that job parameter (surrounded by "#" characters) in your sequential file stage.
However, this would only let you process a single file.
To process multiple, identically-defined files, a simple approach is to declare that the Sequential File stage uses a filter command, and to provide a suitable filter command (such as cat *.csv) that can generate a single stream of rows out of the multiple files.
Your question suggests some unfamiliarity with job parameters. If you define a job parameter, then you can use a reference to that job parameter (surrounded by "#" characters) in your sequential file stage.
However, this would only let you process a single file.
To process multiple, identically-defined files, a simple approach is to declare that the Sequential File stage uses a filter command, and to provide a suitable filter command (such as cat *.csv) that can generate a single stream of rows out of the multiple files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod wrote:Welcome aboard. :D
Your question suggests some unfamiliarity with job parameters. If you define a job parameter, then you can use a reference to that job parameter (surrounded by "#" characters) in your sequential file stage.
However, this would only let you process a single file.
To process multiple, identically-defined files, a simple approach is to declare that the Sequential File stage uses a filter command, and to provide a suitable filter command (such as cat *.csv) that can generate a single stream of rows out of the multiple files.
Ray, Thanks for the reply.
My files contain HEADER and TRAILER. I need to strip them out and load only detail (in between H & T). And this is delta load(using CRC).
I can not CAT 4 files into a single file.
Can I use Job Parameter to pass 'ABC'...'XYZ' as i mentioned to 'filename'. so that it will become 'filename_ABC' ....'filename_XYZ'..?
I am using SEQ file stage.
thanks in advance,
Thank you,
Anu
Anu
thanks DSguru2B.DSguru2B wrote:If the metadata of all the four files is identical and they flow through the same set of transformations and rules. Then yes you can use that one job to process all the four files.
Your can make your job multi-instance. Provide the source file names as a job parameter. And run those jobs as seperate threads at the same time.
Or
you could combine all of your sourcefiles into one file and then process that.
The combining can be done either via link collector or by executing the cat command in the before rouinte
So you have a couple of options.
CHeers
Thank you,
Anu
Anu
Sure - you can parameterize as much of the filename as you need. Typical parameter usage would be one for the directory the file lives in and another for the actual filename, tacked together in the Filename field of the stage. Something like:anu123 wrote:Can I use Job Parameter to pass 'ABC'...'XYZ' as i mentioned to 'filename'. so that it will become 'filename_ABC' ....'filename_XYZ'..? I am using SEQ file stage.
Code: Select all
#SourceFileDirectory#/#SourceFilename#
Code: Select all
#SourceFileDirectory#/filename_#SourceFilenameSuffix#
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
No, I think Ray meant exactly what he said. I'm assuming the intent with the piped pair of commands would be to first get everything but the first line and then everything but the last line. When you're done, all you've got left is the creamy goodness in the middle of the cookie. With yours, Kumar, you can get the header and trailers separately.
However, I must say that I've never been able to make this oft repeated bit of advice work. Perhaps it's an H-PUX thing but 'head +1' is invalid syntax and 'tail +1' gives you the entire file. Both commands really want negative numbers.
However, I must say that I've never been able to make this oft repeated bit of advice work. Perhaps it's an H-PUX thing but 'head +1' is invalid syntax and 'tail +1' gives you the entire file. Both commands really want negative numbers.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
For such case, the following command can be used.
It gives the file without trailer.
tail +1 gives file with out header. Both commands can be piped.
Code: Select all
head -$(expr $(wc -l $FILE | awk '{ print $1 }') \- 1) $FILE
tail +1 gives file with out header. Both commands can be piped.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Usesendmk wrote:how to remove first row from top,
i am not able to get the exact command to use in the filter command of seq stage, head -1 gives header line, i want all lines, but header
how to go abt?
thx
Code: Select all
tail +2 file name.
Pls read the previous post for removing trailer.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Use [/quote]
this command filters the header record, how to remove footer row simultaneously . is there a shell script and the head +1 command does not execute at all
how to go abt
thx kumar
Code: Select all
tail +2 file name.
this command filters the header record, how to remove footer row simultaneously . is there a shell script and the head +1 command does not execute at all
how to go abt
thx kumar
Have you checked the previous post?
You can use the following to strip out header and trailer.
You can use the following to strip out header and trailer.
Code: Select all
head -$(expr $(wc -l filename | awk '{ print $1 }') \- 1) filename | tail +2
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'