Pros and Cons of I/O

bala_135 · Post by **bala_135** » Wed Jul 08, 2009 12:05 pm

Hi All,

We are in the process of standardizing datastage design techniques.

When it comes to modularity, I/O issue was raised.
Can I get some inputs on what are the pros and cons of modular designing versus designing in one single job in terms of speed and resources.
Since modular designing involves staging in intermediate datasets,is that approach is the best compared to designing in one job.

Thanks in advance.

Regards,
Bala.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Wed Jul 08, 2009 1:14 pm

This is a very open ended question. Similar issues are discussed several times before.

What is your requirement? There are many variables such as number of tables sourced from, transformation logic, space available, volume of data, machine capacity etc etc etc etc

So better do on a case-by-case basis.

kduke · Post by **kduke** » Wed Jul 08, 2009 3:09 pm

I agree a little bit. The trade offs are in performance versus hardware. These always exist. There are also trade offs in stability and performance. If you have unlimited funds and need to get the data loaded as quick as possible then that may look different as far as the number of times you land the data. If you need to reuse your lookups then that makes datasets more important. If you have resources available on front or back end databases then you can also push work into these databases. That can change the look of your ETL. That maybe in the product soon. They demonstrated last year in IOD.

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Wed Jul 08, 2009 3:34 pm

The other tradeoff can be I/O vs. recoverability. At one end of the spectrum you have the very large PX job that does one read of the data, lots of transformation and writes the end-result out to the target table. It typically has the best throughput, but it can also be the hardest job to restart and recover when something goes wrong.

kduke · Post by **kduke** » Wed Jul 08, 2009 7:19 pm

I lump that into what I called stability. I agree totally.

bala_135 · Post by **bala_135** » Wed Jul 08, 2009 11:53 pm

Thanks Andy,kim

I don't see any open endness in the question.It was very specfic that staging the jobs in intermediate datasets allows modularity in design but resource usabilty,I/O speed need to be optimized.If anyone has done this optimization earlier are most welcome to share their views.

Andy:-I have got some inputs in your answer.
Throughput is more in designing in single jobs but recovery,maintainability,scalabilty are the potential issues.

Correct me if i am wrong.
Thanks for the inputs

Regards,
Bala.