Real-time file processing - Design question
Posted: Wed Feb 14, 2007 4:15 pm
Hi,
We have a requirement where we need to process data files received from external vendors real-time as soon as they arrive. Till now we are batch loading these files at pre-scheduled times using AutoSys scheduling tool.
Couple of options we are considering are:
Option 1:
----------
Use Autosys file watcher to sense arrival of a new file and call DataStage job sequence. That means one instance of job sequence for each file. Issues we have with this option are monitoring & scalability.
Monitoring - With our current batch load framework, Autosys executes (thru dsjob -run) DataStage job sequence and waits unitl it is completed to get the status back so the job status can be moitored and production support team can be notified of failures. Real-time processing means AutoSys job can no loger wait for the DS job to finish as it needs to continue with other files. That would mean we need another mechanism for monitoring the jobs started by Autosys.
Scalability - What if we get 50 files at the same time. That would mean 50 instance of job seuquence and subsequent load jobs running at the same time.
Option 2:
----------
RTI. Is RTI suitable for this kind of processing?
Most of my knowledge about RTI is from manuals and from this forum. I understand there are limitiations on which jobs can be RTI-enabled, especially with job sequences and jobs where you have input,output links to passive stages. Moreover we didn't licence RTI but are willing to if I can truely convince management that RTI is the right solution.
Any feedback or new options are welcome.
Regards
Sri
We have a requirement where we need to process data files received from external vendors real-time as soon as they arrive. Till now we are batch loading these files at pre-scheduled times using AutoSys scheduling tool.
Couple of options we are considering are:
Option 1:
----------
Use Autosys file watcher to sense arrival of a new file and call DataStage job sequence. That means one instance of job sequence for each file. Issues we have with this option are monitoring & scalability.
Monitoring - With our current batch load framework, Autosys executes (thru dsjob -run) DataStage job sequence and waits unitl it is completed to get the status back so the job status can be moitored and production support team can be notified of failures. Real-time processing means AutoSys job can no loger wait for the DS job to finish as it needs to continue with other files. That would mean we need another mechanism for monitoring the jobs started by Autosys.
Scalability - What if we get 50 files at the same time. That would mean 50 instance of job seuquence and subsequent load jobs running at the same time.
Option 2:
----------
RTI. Is RTI suitable for this kind of processing?
Most of my knowledge about RTI is from manuals and from this forum. I understand there are limitiations on which jobs can be RTI-enabled, especially with job sequences and jobs where you have input,output links to passive stages. Moreover we didn't licence RTI but are willing to if I can truely convince management that RTI is the right solution.
Any feedback or new options are welcome.
Regards
Sri