Running sequence multiple times with different parameters

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
khron
Participant
Posts: 12
Joined: Wed Dec 28, 2005 6:49 am

Running sequence multiple times with different parameters

Post by khron »

I have a sequence which takes some arguments some Job Parameters. I have a csv file with series of arguments for multiple runs of this sequence. Is there any way I can loop through each record in the CSV file and run the job with the parameters from that csv file. That CSV file is likely to contain several thousands records.

To make it more clear here is an example:
I have a job ImportFeed which has these job parameters:
- file_path
- customer_id
And I have a CSV file with the following fields:
file_path customer_id
I want a way to run the ImportFeed job with parameters from the CSV file.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There are three basic approaches that you might take, the first one is the simplest for you:

a) write a shell or awk script to read and parse the .CSV file into the parameter and then use the dsjob command line call to your DataStage job and pass the parameters to it.

b) write a DataStage job that reads your .CSV file, in a transform stage use the SDK routine calls to execute your job with the parameters. I'm not at a DS site at the moment, so I can't give you the exact routine names, you'll have to find them in the list. You could also add in your own user-written DataStage/BASIC routine to do this, using some of the code posted in this forum as your template. Also, you could call a shell using DSExecute and use the same command-line interface in (a) to execute the job.

c) write some custom code in DataStage/BASIC to read the .CSV file, parse the elements and start your processing job.

All 3 have the capability, if your job does, of running in multi-instance mode in parallel.
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Hi,

We use a tool called metacontroller. Its a scheduling tool which reads data from the table.

Now, if you enter the required data in the table the metacontroller triggers the sequence mulitiple times using mulitple parameters.

Regards
Sreeni
khron
Participant
Posts: 12
Joined: Wed Dec 28, 2005 6:49 am

Post by khron »

Thanks for the reply.
I don't want to mess with external scripts so I will probably not go for the 1st solution.
The transformer stage looks better. Where should I call the SDK? Should I make a stage variable "JobReturn" which will call the job for each record (using SDK)?
Then the output of the transformer could be a nice CSV file with the status of the jobs.
This is great cause I can probably split the input to more such transformers to get some parallelism.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

khron,

you won't need to split the job into multiple transformers - since if you initiate the run of a job you control whether the calling process waits for that job to complete or continues on without waiting. As I stated in my first mail, I don't have access to DS right now; but with just a little bit of searching throught the routines in the SDK (probably less than a minute) you'll find the appropriate subroutine call. Although I think it might be easiest to use the SDK version of DSExecute and have you issue your call to dsjob directly from the transform stage variable.
khron
Participant
Posts: 12
Joined: Wed Dec 28, 2005 6:49 am

Post by khron »

ArndW,

Spawning jobs like this in the background is likely to hog the machine if I have like 10k records in the CSV file. By splitting the file I control how many execution threads I have.
I'm sure I can find that SDK function.. don't worry.

Sreenivasulu,
I'll take a look at MetaController. We do need a better scheduling solution. Any idea if they give out evaluation versions?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you want to make optimal use of your system and will have that many runs of your DataStage job running, then a custom solution will be best. I've done some BASIC coding before in the past where I need to do something similar to what you intend; this program is set to ensure that <n> instances are always running. A pseudocode model looks like this:

Code: Select all

<set Finished to FALSE>
<DIMension the JobHandle array to <n> for #concurrent jobs>
LOOP UNTIL Finished=TRUE
   <attach to job with first free entry in JobHandle array and set current instance>
   <set job parameters using DSSetParameter>
   <run job without waiting using DSRunJob()>
   <check JobHandle array entries for any that no longer are DSJSJ.JOBRUNNING using DSGetJobInfo()>
   <if none are available, sleep for 30 seconds and repeat previous step>
   <if no more processing to do, set Finished = TRUE
REPEAT
Post Reply