Multiple source files single ETL

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

sendmk
Charter Member
Charter Member
Posts: 136
Joined: Mon Oct 03, 2005 5:02 am

Post by sendmk »

kumar_s wrote:Have you checked the previous post?
You can use the following to strip out header and trailer.

Code: Select all

head -$(expr $(wc -l filename | awk '{ print $1 }') \- 1) filename | tail +2
yes i way checking to execute the whole script in the filter command itself, without creating a shell script,

anyway

thx kumar for reminding
sendmk
Charter Member
Charter Member
Posts: 136
Joined: Mon Oct 03, 2005 5:02 am

Post by sendmk »

i am not sure what the "awk '{ print $1 }" does in the expression

head -$(expr $(wc -l filename | awk '{ print $1 }') \- 1) filename | tail +2"
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

wc -l will return number of lines in the file. In addition, the filename along with that.
awk '{ print $1 }' is use just to retreive the number alone.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
sendmk
Charter Member
Charter Member
Posts: 136
Joined: Mon Oct 03, 2005 5:02 am

Post by sendmk »

got it

thx kumar
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

chulett wrote:
anu123 wrote:Can I use Job Parameter to pass 'ABC'...'XYZ' as i mentioned to 'filename'. so that it will become 'filename_ABC' ....'filename_XYZ'..? I am using SEQ file stage.
Sure - you can parameterize as much of the filename as you need. Typical parameter usage would be one for the directory the file lives in and another for the actual filename, tacked together in the Filename field of the stage. Something like:

Code: Select all

#SourceFileDirectory#/#SourceFilename#
Or you could parameterize a portion of the filename and hard-code another as you've noted. You could use multiple parameters which when combined together constitute your filename.

Code: Select all

#SourceFileDirectory#/filename_#SourceFilenameSuffix#
Whatever you need. :wink:

Thank you one and all.
Chulett,

How can I pass values to "#SourceFilenameSuffix#"( i.e 'ABC', 'XYZ'..)? When I start the job, it should be able to read 4 seperate files...
#/filename_#_ABC
#/filename_#_XYZ
#/filename_#_MNU ...ect.

Can I have a sequencer above the job which passes 'ABC','XYZ'.....to filename and runs the job 4 ( the no of files I have) times.
and loads table.

It may sound like a basic question for you. Sorry about that.

thanks in advance
Thank you,
Anu
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

anu123 wrote:
chulett wrote:
anu123 wrote:Can I use Job Parameter to pass 'ABC'...'XYZ' as i mentioned to 'filename'. so that it will become 'filename_ABC' ....'filename_XYZ'..? I am using SEQ file stage.
Sure - you can parameterize as much of the filename as you need. Typical parameter usage would be one for the directory the file lives in and another for the actual filename, tacked together in the Filename field of the stage. Something like:

Code: Select all

#SourceFileDirectory#/#SourceFilename#
Or you could parameterize a portion of the filename and hard-code another as you've noted. You could use multiple parameters which when combined together constitute your filename.

Code: Select all

#SourceFileDirectory#/filename_#SourceFilenameSuffix#
Whatever you need. :wink:

Thank you one and all.
Chulett,

How can I pass values to "#SourceFilenameSuffix#"( i.e 'ABC', 'XYZ'..)? When I start the job, it should be able to read 4 seperate files...
#/filename_#_ABC
#/filename_#_XYZ
#/filename_#_MNU ...ect.

Can I have a sequencer above the job which passes 'ABC','XYZ'.....to filename and runs the job 4 ( the no of files I have) times.
and loads table.

It may sound like a basic question for you. Sorry about that.

thanks in advance
You can have "#SourceFilenameSuffix#" as a comma seperated string such as ABC,XYZ,MNU etc.. and load it into user status. In the job sequencer have a loop activity which reads this comma seperated string with comma as a delimiter. For each loop call the job with the job parameter as loop counter. The job will execute for each file name suffix.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi Anu,
You can create a multiple instance job. Since all the files are with same formate, you can call the job 4 times in the job sequence with 4 different parameters as mentioned.
Or you can concatinate all the files in the single stage by using cat filename_#_*
Or you can use the property of file patter to read the above pattern.
If all the files posses header and trailer, Multiple instace will be the fair and easy solution.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

anu123 wrote:Can I have a sequencer above the job which passes 'ABC','XYZ'.....to filename and runs the job 4 ( the no of files I have) times. and loads table.
Short answer is 'yes'. How you architect that is another matter.

Do you know the four values ahead of time or do they change from run to run? If they are fixed in number and constant, your job is cake. Create a Sequence job that calls the same processing job 4 times in a row, one after another, and pass in a new value for the parameter each time. Or make the job a Multi Instance job (if that's applicable) and run it four time in parallel.

If you don't know the filenames to process, the solution gets more involved. You need a mechanism to get a current list of filenames and then one to pass those into the processing job. That's where you would read up on the Start Loop and End Loop stages and use them in conjunction with the User Variables stage. They would allow you get take a delimited list and call the same job (or series of jobs) in a looping construct, peeling different values off the delimited list for each iteration of the loop. There are examples of doing this in the manual.
-craig

"You can never have too many knives" -- Logan Nine Fingers
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

balajisr wrote:
anu123 wrote:
chulett wrote: Sure - you can parameterize as much of the filename as you need. Typical parameter usage would be one for the directory the file lives in and another for the actual filename, tacked together in the Filename field of the stage. Something like:

Code: Select all

#SourceFileDirectory#/#SourceFilename#
Or you could parameterize a portion of the filename and hard-code another as you've noted. You could use multiple parameters which when combined together constitute your filename.

Code: Select all

#SourceFileDirectory#/filename_#SourceFilenameSuffix#
Whatever you need. :wink:

Thank you one and all.
Chulett,

How can I pass values to "#SourceFilenameSuffix#"( i.e 'ABC', 'XYZ'..)? When I start the job, it should be able to read 4 seperate files...
#/filename_#_ABC
#/filename_#_XYZ
#/filename_#_MNU ...ect.

Can I have a sequencer above the job which passes 'ABC','XYZ'.....to filename and runs the job 4 ( the no of files I have) times.
and loads table.

It may sound like a basic question for you. Sorry about that.

thanks in advance
You can have "#SourceFilenameSuffix#" as a comma seperated string such as ABC,XYZ,MNU etc.. and load it into user status. In the job sequencer have a loop activity which reads this comma seperated string with comma as a delimiter. For each loop call the job with the job parameter as loop counter. The job will execute for each file name suffix.

Thanks Balaji and Kumar.
I have created a Sequence Job with 4 Job activities with different parameter values ('#FilenameSuffix#'). These 4 job activities will call the underlying single job 4 times.

Balaji,Could you please explain how we can load ABC,MNU,....inot USER STATUS.excuse me for my ignorence.

thanks in advance.
Thank you,
Anu
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

Create a routine and send the comma delimited string as an argument to the routine.Use DSSetUserStatus function in your routine to set the value in the user status.Retreive the user status from the next job in the job sequence.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

One big problem with this approach - jobs have a 'User Status' area, routines don't. Besides, if you've already got a delimited string of all the values, do you really need to go to all that effort to park it in User Status? Put it in a User Variables stage instead... assuming you have 7.5.x installed. :wink:

Take the User Status approach if the values are being derived inside a Server job and need to be passed back out for others to use.
-craig

"You can never have too many knives" -- Logan Nine Fingers
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

chulett wrote:
anu123 wrote:Can I have a sequencer above the job which passes 'ABC','XYZ'.....to filename and runs the job 4 ( the no of files I have) times. and loads table.
Short answer is 'yes'. How you architect that is another matter.

Do you know the four values ahead of time or do they change from run to run? If they are fixed in number and constant, your job is cake. Create a Sequence job that calls the same processing job 4 times in a row, one after another, and pass in a new value for the parameter each time. Or make the job a Multi Instance job (if that's applicable) and run it four time in parallel.

If you don't know the filenames to process, the solution gets more involved. You need a mechanism to get a current list of filenames and then one to pass those into the processing job. That's where you would read up on the Start Loop and End Loop stages and use them in conjunction with the User Variables stage. They would allow you get take a delimited list and call the same job (or series of jobs) in a looping construct, peeling different values off the delimited list for each iteration of the loop. There are examples of doing this in the manual.

thanks evey one for you valuable inputs.

I have a sequence job and 4 job activities in a row to call same job 4 times with different file names. The 4 values are constant.

Iam not sure about the Multi Instance of the job/ and custom routines. I feel these are a bit advanced concepts to me.

thank you one and all for your valuable time.
Thank you,
Anu
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Anu

Click the Job Properties and inside that tick the Allow Multiple Instance box. It's that easy to make a job multiple instance. While running it you need to give InvocationID which identifies each instance ran for that job. For Custom routine you can start looking at the Routines which comes with datastage to have an idea about how to create custom routines.

Thanks
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

rasi wrote:Anu

Click the Job Properties and inside that tick the Allow Multiple Instance box. It's that easy to make a job multiple instance. While running it you need to give InvocationID which identifies each instance ran for that job. For Custom routine you can start looking at the Routines which comes with datastage to have an idea about how to create custom routines.

Thanks
thank you siva. I made the jod Multi Instance. As of now I have a sequence job above this job to call it 4 times in a row.

Once I made this job as Multi instance, in my sequence job, all 4 job activities are need not to be in a row. right. They are called paralelly and 4 files processed simultaneously. Just trying to confirm that I understand it correctly. In this case how Invocation ID will be passed to the server job/

thank you once again.
Thank you,
Anu
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Anu

Once you made your Job as Muliti Instance job then if you go and open your Sequencer job you will have Invocation Id tab under Job name. This is the place where you need to type in the invocation id for each instance. hope this helps
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
Post Reply