Multiple source files single ETL

anu123 · Post by **anu123** » Fri Mar 24, 2006 4:07 pm

Hi all:

I have 4 files named as below..

filename_ABC.csv
filename_DEF.csv
filename_MNU.csv
filename_XYZ.csv

All four files are in same format/structure, but contains different business data. I have an ETL to load "filename_ABC.csv" into Oracle table.

My question is...can I use this single ETL to load all 4 files every month? All files will land in FTP server at same time.

Please through some light ...

thanks in advance

DSguru2B · Post by **DSguru2B** » Fri Mar 24, 2006 4:12 pm

If the metadata of all the four files is identical and they flow through the same set of transformations and rules. Then yes you can use that one job to process all the four files.
Your can make your job multi-instance. Provide the source file names as a job parameter. And run those jobs as seperate threads at the same time.
Or
you could combine all of your sourcefiles into one file and then process that.
The combining can be done either via link collector or by executing the cat command in the before rouinte
So you have a couple of options.
CHeers

donny · Post by **donny** » Fri Mar 24, 2006 4:42 pm

Hi
even i have a similar issue but question is how the datastage job gets the sourec file name if we parameterize the source filename

thanks
donny

ray.wurlod · Post by **ray.wurlod** » Fri Mar 24, 2006 4:46 pm

Welcome aboard. :D

Your question suggests some unfamiliarity with job parameters. If you define a job parameter, then you can use a reference to that job parameter (surrounded by "#" characters) in your sequential file stage.

However, this would only let you process a single file.

To process multiple, identically-defined files, a simple approach is to declare that the Sequential File stage uses a filter command, and to provide a suitable filter command (such as cat *.csv) that can generate a single stream of rows out of the multiple files.

anu123 · Post by **anu123** » Fri Mar 24, 2006 7:21 pm

ray.wurlod wrote:Welcome aboard. :D

Your question suggests some unfamiliarity with job parameters. If you define a job parameter, then you can use a reference to that job parameter (surrounded by "#" characters) in your sequential file stage.

However, this would only let you process a single file.

To process multiple, identically-defined files, a simple approach is to declare that the Sequential File stage uses a filter command, and to provide a suitable filter command (such as cat *.csv) that can generate a single stream of rows out of the multiple files.

Ray, Thanks for the reply.
My files contain HEADER and TRAILER. I need to strip them out and load only detail (in between H & T). And this is delta load(using CRC).
I can not CAT 4 files into a single file.

Can I use Job Parameter to pass 'ABC'...'XYZ' as i mentioned to 'filename'. so that it will become 'filename_ABC' ....'filename_XYZ'..?
I am using SEQ file stage.

thanks in advance,

anu123 · Post by **anu123** » Fri Mar 24, 2006 7:22 pm

DSguru2B wrote:If the metadata of all the four files is identical and they flow through the same set of transformations and rules. Then yes you can use that one job to process all the four files.
Your can make your job multi-instance. Provide the source file names as a job parameter. And run those jobs as seperate threads at the same time.
Or
you could combine all of your sourcefiles into one file and then process that.
The combining can be done either via link collector or by executing the cat command in the before rouinte
So you have a couple of options.
CHeers

thanks DSguru2B.

chulett · Post by **chulett** » Sat Mar 25, 2006 12:36 am

anu123 wrote:Can I use Job Parameter to pass 'ABC'...'XYZ' as i mentioned to 'filename'. so that it will become 'filename_ABC' ....'filename_XYZ'..? I am using SEQ file stage.

Sure - you can parameterize as much of the filename as you need. Typical parameter usage would be one for the directory the file lives in and another for the actual filename, tacked together in the Filename field of the stage. Something like:

Code: Select all

#SourceFileDirectory#/#SourceFilename#

Or you could parameterize a portion of the filename and hard-code another as you've noted. You could use multiple parameters which when combined together constitute your filename.

Code: Select all

#SourceFileDirectory#/filename_#SourceFilenameSuffix#

Whatever you need.

ray.wurlod · Post by **ray.wurlod** » Sat Mar 25, 2006 1:12 am

Of course you can cat all the files. You just need some extra commands in the pipeline (like tail +1 and head +1) to strip off the header and trailer lines.

kumar_s · Post by **kumar_s** » Sat Mar 25, 2006 3:55 am

Perhaps ray is saying about head -1 and tail -1.

chulett · Post by **chulett** » Sat Mar 25, 2006 7:47 am

No, I think Ray meant exactly what he said. I'm assuming the intent with the piped pair of commands would be to first get everything but the first line and then everything but the last line. When you're done, all you've got left is the creamy goodness in the middle of the cookie. With yours, Kumar, you can get the header and trailers separately.

However, I must say that I've never been able to make this oft repeated bit of advice work. Perhaps it's an H-PUX thing but 'head +1' is invalid syntax and 'tail +1' gives you the entire file. Both commands really want negative numbers.

kumar_s · Post by **kumar_s** » Sat Mar 25, 2006 10:28 pm

For such case, the following command can be used.

Code: Select all

head -$(expr $(wc -l $FILE | awk '{ print $1 }') \- 1) $FILE

It gives the file without trailer.

tail +1 gives file with out header. Both commands can be piped.

sendmk · Post by **sendmk** » Mon Mar 27, 2006 3:11 am

how to remove first row and last row,
i am not able to get the exact command to use in the filter command of seq stage, head -1 and tail -1 gives header line and footer line, i want all lines, but header and footer

how to go abt?

thx

kumar_s · Post by **kumar_s** » Mon Mar 27, 2006 3:20 am

sendmk wrote:how to remove first row from top,
i am not able to get the exact command to use in the filter command of seq stage, head -1 gives header line, i want all lines, but header

how to go abt?

thx

Use

Code: Select all

tail +2 file name.

to remove the first line.
Pls read the previous post for removing trailer.

sendmk · Post by **sendmk** » Mon Mar 27, 2006 3:25 am

Use

Code: Select all

tail +2 file name.

[/quote]

this command filters the header record, how to remove footer row simultaneously . is there a shell script and the head +1 command does not execute at all

how to go abt

thx kumar

kumar_s · Post by **kumar_s** » Mon Mar 27, 2006 3:31 am

Have you checked the previous post?
You can use the following to strip out header and trailer.

Code: Select all

head -$(expr $(wc -l filename | awk '{ print $1 }') \- 1) filename | tail +2

DSXchange

Multiple source files single ETL

Multiple source files single ETL

Re: Multiple source files single ETL