Page 1 of 1

Pros and Cons of I/O

Posted: Wed Jul 08, 2009 12:05 pm
by bala_135
Hi All,

We are in the process of standardizing datastage design techniques.

When it comes to modularity, I/O issue was raised.
Can I get some inputs on what are the pros and cons of modular designing versus designing in one single job in terms of speed and resources.
Since modular designing involves staging in intermediate datasets,is that approach is the best compared to designing in one job.

Thanks in advance.

Regards,
Bala.

Posted: Wed Jul 08, 2009 1:14 pm
by Sainath.Srinivasan
This is a very open ended question. Similar issues are discussed several times before.

What is your requirement? There are many variables such as number of tables sourced from, transformation logic, space available, volume of data, machine capacity etc etc etc etc

So better do on a case-by-case basis.

Posted: Wed Jul 08, 2009 3:09 pm
by kduke
I agree a little bit. The trade offs are in performance versus hardware. These always exist. There are also trade offs in stability and performance. If you have unlimited funds and need to get the data loaded as quick as possible then that may look different as far as the number of times you land the data. If you need to reuse your lookups then that makes datasets more important. If you have resources available on front or back end databases then you can also push work into these databases. That can change the look of your ETL. That maybe in the product soon. They demonstrated last year in IOD.

Posted: Wed Jul 08, 2009 3:34 pm
by asorrell
The other tradeoff can be I/O vs. recoverability. At one end of the spectrum you have the very large PX job that does one read of the data, lots of transformation and writes the end-result out to the target table. It typically has the best throughput, but it can also be the hardest job to restart and recover when something goes wrong.

Posted: Wed Jul 08, 2009 7:19 pm
by kduke
I lump that into what I called stability. I agree totally.

Posted: Wed Jul 08, 2009 11:53 pm
by bala_135
Thanks Andy,kim

I don't see any open endness in the question.It was very specfic that staging the jobs in intermediate datasets allows modularity in design but resource usabilty,I/O speed need to be optimized.If anyone has done this optimization earlier are most welcome to share their views.

Andy:-I have got some inputs in your answer.
Throughput is more in designing in single jobs but recovery,maintainability,scalabilty are the potential issues.

Correct me if i am wrong.
Thanks for the inputs


Regards,
Bala.