Multiple source files single ETL

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Multiple source files single ETL

Post by anu123 »

Hi all:

I have 4 files named as below..

filename_ABC.csv
filename_DEF.csv
filename_MNU.csv
filename_XYZ.csv

All four files are in same format/structure, but contains different business data. I have an ETL to load "filename_ABC.csv" into Oracle table.

My question is...can I use this single ETL to load all 4 files every month? All files will land in FTP server at same time.

Please through some light ...

thanks in advance
Thank you,
Anu
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

If the metadata of all the four files is identical and they flow through the same set of transformations and rules. Then yes you can use that one job to process all the four files.
Your can make your job multi-instance. Provide the source file names as a job parameter. And run those jobs as seperate threads at the same time.
Or
you could combine all of your sourcefiles into one file and then process that.
The combining can be done either via link collector or by executing the cat command in the before rouinte
So you have a couple of options.
CHeers 8)
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
donny
Premium Member
Premium Member
Posts: 15
Joined: Thu Dec 15, 2005 11:45 am

Re: Multiple source files single ETL

Post by donny »

Hi
even i have a similar issue but question is how the datastage job gets the sourec file name if we parameterize the source filename

thanks
donny
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard. :D

Your question suggests some unfamiliarity with job parameters. If you define a job parameter, then you can use a reference to that job parameter (surrounded by "#" characters) in your sequential file stage.

However, this would only let you process a single file.

To process multiple, identically-defined files, a simple approach is to declare that the Sequential File stage uses a filter command, and to provide a suitable filter command (such as cat *.csv) that can generate a single stream of rows out of the multiple files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

ray.wurlod wrote:Welcome aboard. :D

Your question suggests some unfamiliarity with job parameters. If you define a job parameter, then you can use a reference to that job parameter (surrounded by "#" characters) in your sequential file stage.

However, this would only let you process a single file.

To process multiple, identically-defined files, a simple approach is to declare that the Sequential File stage uses a filter command, and to provide a suitable filter command (such as cat *.csv) that can generate a single stream of rows out of the multiple files.

Ray, Thanks for the reply.
My files contain HEADER and TRAILER. I need to strip them out and load only detail (in between H & T). And this is delta load(using CRC).
I can not CAT 4 files into a single file.

Can I use Job Parameter to pass 'ABC'...'XYZ' as i mentioned to 'filename'. so that it will become 'filename_ABC' ....'filename_XYZ'..?
I am using SEQ file stage.

thanks in advance,
Thank you,
Anu
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

DSguru2B wrote:If the metadata of all the four files is identical and they flow through the same set of transformations and rules. Then yes you can use that one job to process all the four files.
Your can make your job multi-instance. Provide the source file names as a job parameter. And run those jobs as seperate threads at the same time.
Or
you could combine all of your sourcefiles into one file and then process that.
The combining can be done either via link collector or by executing the cat command in the before rouinte
So you have a couple of options.
CHeers 8)
thanks DSguru2B.
Thank you,
Anu
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

anu123 wrote:Can I use Job Parameter to pass 'ABC'...'XYZ' as i mentioned to 'filename'. so that it will become 'filename_ABC' ....'filename_XYZ'..? I am using SEQ file stage.
Sure - you can parameterize as much of the filename as you need. Typical parameter usage would be one for the directory the file lives in and another for the actual filename, tacked together in the Filename field of the stage. Something like:

Code: Select all

#SourceFileDirectory#/#SourceFilename#
Or you could parameterize a portion of the filename and hard-code another as you've noted. You could use multiple parameters which when combined together constitute your filename.

Code: Select all

#SourceFileDirectory#/filename_#SourceFilenameSuffix#
Whatever you need. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Of course you can cat all the files. You just need some extra commands in the pipeline (like tail +1 and head +1) to strip off the header and trailer lines.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Perhaps ray is saying about head -1 and tail -1.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, I think Ray meant exactly what he said. I'm assuming the intent with the piped pair of commands would be to first get everything but the first line and then everything but the last line. When you're done, all you've got left is the creamy goodness in the middle of the cookie. With yours, Kumar, you can get the header and trailers separately.

However, I must say that I've never been able to make this oft repeated bit of advice work. Perhaps it's an H-PUX thing but 'head +1' is invalid syntax and 'tail +1' gives you the entire file. Both commands really want negative numbers. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

For such case, the following command can be used.

Code: Select all

head -$(expr $(wc -l $FILE | awk '{ print $1 }') \- 1) $FILE
It gives the file without trailer.

tail +1 gives file with out header. Both commands can be piped.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
sendmk
Charter Member
Charter Member
Posts: 136
Joined: Mon Oct 03, 2005 5:02 am

Post by sendmk »

how to remove first row and last row,
i am not able to get the exact command to use in the filter command of seq stage, head -1 and tail -1 gives header line and footer line, i want all lines, but header and footer

how to go abt?

thx
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

sendmk wrote:how to remove first row from top,
i am not able to get the exact command to use in the filter command of seq stage, head -1 gives header line, i want all lines, but header

how to go abt?

thx
Use

Code: Select all

tail +2 file name.
to remove the first line.
Pls read the previous post for removing trailer.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
sendmk
Charter Member
Charter Member
Posts: 136
Joined: Mon Oct 03, 2005 5:02 am

Post by sendmk »

Use

Code: Select all

tail +2 file name.
[/quote]

this command filters the header record, how to remove footer row simultaneously . is there a shell script and the head +1 command does not execute at all

how to go abt

thx kumar
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Have you checked the previous post?
You can use the following to strip out header and trailer.

Code: Select all

head -$(expr $(wc -l filename | awk '{ print $1 }') \- 1) filename | tail +2
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply