Runnning a data stage job continiously

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sjordery
Premium Member
Premium Member
Posts: 202
Joined: Thu Jun 08, 2006 5:58 am

Runnning a data stage job continiously

Post by sjordery »

Hi All,

Set of files lands on a defined directory on data stage server.
The names of the files are

EM01MMDD.M
EM11MMDD.M
EM13MMDD.M

The requirements are like; the data stage process should run continuously to check for any of the above three files and it should process once it finds any of the three files.

I am thinking how to run the data stage job continuously?
I am aware of scheduling it through UNIX script but we can only schedule them to run in a particular time, is it possible to make that run continuously?
And how the three file names will be passed to a single multi instance process?

Any help on this will be appreciated.

Regards,
Sjordery
mahadev.v
Participant
Posts: 111
Joined: Tue May 06, 2008 5:29 am
Location: Bangalore

Post by mahadev.v »

A sequence in infinite loop. Check if file exists, then trigger the job. Else Check again. You would also need to handle the abort condition for the job.
"given enough eyeballs, all bugs are shallow" - Eric S. Raymond
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It would probably be a sequence, rather than an individual job, that runs continuously. What happens if another of the files appears while one is being processed? You have to be able to handle rapid arrival of files. Don't forget to be able to trigger the job to shut down gracefully, perhaps by checking for existence of another file (perhaps called .shutdown).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

You can use the wait for file activity stage within a sequence to see for the existance of the file and depending on that trigger the datastage jobs or you can write a unix script to verify for the file existance and write dsjob command .
while [ $variable ]
do
if [ check for existance of files ]
then
dsjob
else
sleep 2 ( wait sometime )
fi
done
Nag
sjordery
Premium Member
Premium Member
Posts: 202
Joined: Thu Jun 08, 2006 5:58 am

Post by sjordery »

Mahadev,

If I am not wrong you are talking about the start loop and end loop activity stage in the sequence job.
But how to make this to run continiously and since the file names we cant hard code as date in the file name chnages every day so how to make this generic?

Ray,

No that is not the case,all the 3 files lands in different time.

Any suggestions?
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

If you want to process those files one by one then you can use the diagram below

Code: Select all

start loop-------ls -lt EM01*|wc -l ----0-----Sleep 60-----End loop
                                  |
                                  |
                                  |
                               take the first file(is more than 1)
                                  |
                                  |
                             Process the job
                                  |
                                  |
                              Rename/Move the file
                                  |
                                  |
                          End loop Activity
Else you can tweak it a bit to work as you want. And this can also be derived from the earlier posts.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
sjordery
Premium Member
Premium Member
Posts: 202
Joined: Thu Jun 08, 2006 5:58 am

Post by sjordery »

Thanks Priyadarshi.
I will update on this.

Regards,
Sjordery.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Not sure if that will make it run continuously as I am pretty sure there is a limit to the number of iterations that a sequence can make.

You can however have a multiple instance sequence. Then just have the sequence call itself with a different instance. You only need 2 instances for this to work. Ie after a specified number of iterations, instance 0 can start instance 1 and vice versa. That way the sequence is continously running.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The only thing that is safe to run 'continuously' is an RTI enabled / SOA / WISD job. I wouldn't even consider anything else. At the very least, give it a breather, a chance to stop and restart again - for example, run it over 'business hours' and then let it sit idle for some period at night... or vice versa... even if that 'rest period' is only a few minutes.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply