How to schedule a job to run every 5 minutes

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rajeshknl
Participant
Posts: 22
Joined: Thu Jul 17, 2008 8:09 pm

How to schedule a job to run every 5 minutes

Post by rajeshknl »

Hello

I recieve a file every 5 minutes(24x7). They are all of the same filetype and processed by the same job. The sizes of the files might vary. For example file1 might be 10MB and file2 2MB.

The issue is i dont want to the stage the files and run one after another waiting for the previous file to be processed completely. I want to process the file as soon as i recieve it.

I know the concept of multiple instance but how do i use it? I mean how do i pass different filenames to different instances as parameters.

How i do i keep track of which file is being processed/waiting etc.

I also want to limit the max no of instances to say 20 so that my box does not crash.

Any ideas/experiences references are welcome.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Re: How to schedule a job to run every 5 minutes

Post by priyadarshikunal »

rajeshknl wrote:Hello

I recieve a file every 5 minutes(24x7). They are all of the same filetype and processed by the same job. The sizes of the files might vary. For example file1 might be 10MB and file2 2MB.

The issue is i dont want to the stage the files and run one after another waiting for the previous file to be processed completely. I want to process the file as soon as i recieve it.

I know the concept of multiple instance but how do i use it? I mean how do i pass different filenames to different instances as parameters.

How i do i keep track of which file is being processed/waiting etc.

I also want to limit the max no of instances to say 20 so that my box does not crash.

Any ideas/experiences references are welcome.

There are a few contraditions in what you want. But i hope i am answering it correctly.

what you need to do is

1. allow multi instance for that job.
2. checking new file in your directory every five minutes. you may store timestamp for previous file and compare it with the latest one.
3. If you get that file, pass the file name as parameter to the job.
4. the invocation id for that job needs to be unique and your instances should be dynamic so you can use current date time as invocation id.

also you don't want instances more than 20 so you can count the number of instances running from the command

Code: Select all

ps -fu <username>|grep <jobname> |wc -l
if its more than 20 send the command to another script block that checks the number of jobs running every minute and if count falls down runs your command.

this is the best way i can think now (will think on other aspects of this). Please correct me if i misunderstood your requirement.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is a good case to write specialized job control code.

Keep an array of job names currently executing, and a similar array of job handles.

You can discover all the information you need from these arrays.

Also keep a scalar variable recording the number of jobs currently running, and don't start any more until one (at least) finishes.

Remove the jobs from the arrays once you've finished any post-execution processing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
AmeyJoshi14
Participant
Posts: 334
Joined: Fri Dec 01, 2006 5:17 am
Location: Texas

Post by AmeyJoshi14 »

Hi,
As per the priyadarshikunal's post ,that is the points mention in the post , i have created a script which might help you to solve your problem. :lol:
The script is :

Code: Select all

#!/bin/ksh
DIRPATH=path of the directory
#Assuming the directory is empty or running for the first time
cntprev=1
#This script will run continuously
some=1
while [ $some -ne 150 ] #so that script will run continously
do
  cntorg=`ls -ltr | wc -l`  #taking the count
  cntnew=`expr $cntorg - 1` 
  #Now comparing the value
  if [ $cntnew -ne $cntprev ]
    then
        numjobs=`ps -fu <username>|grep <jobname> |wc -l` 
        #To check that not more than 20 instances are running
    		if [ $numjobs -lt 20 ]
    		then
    		invoctid=$cntnew
    		. $DSHOME/dsenv
    		cd $DSHOME/bin 
    		dsjob -run Project_name Job_Name.$invoctid 2>dev/null
  fi  	
  	cntprev=$cntnew
  	sleep 60
done  
http://findingjobsindatastage.blogspot.com/
Theory is when you know all and nothing works. Practice is when all works and nobody knows why. In this case we have put together theory and practice: nothing works. and nobody knows why! (Albert Einstein)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I would stick with the approach Ray mentioned and have implemented the very same myself, or at least something very similar using the techniques mentioned.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

AmeyJoshi, why do you discard stderr from the dsjob command? I would expect to have this information available in the event of failure.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
AmeyJoshi14
Participant
Posts: 334
Joined: Fri Dec 01, 2006 5:17 am
Location: Texas

Post by AmeyJoshi14 »

ray.wurlod wrote:AmeyJoshi, why do you discard stderr from the dsjob command? I would expect to have this information available in the event of failure. ...
Hi,
Since this script will run continuously , so every time when it runs sucessfully it will show the 'status code=0' , due to this i have discarded the stderr . I have not thought of this option.. :oops:

Craig Guruji : i have just thought of expanding priyadarshikunal's idea a bit :( ......that's why i have posted it nothing else :) and i also do think Ray guruji's option is good infact best.. :)
http://findingjobsindatastage.blogspot.com/
Theory is when you know all and nothing works. Practice is when all works and nobody knows why. In this case we have put together theory and practice: nothing works. and nobody knows why! (Albert Einstein)
Post Reply