Pros and Cons of I/O

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bala_135
Premium Member
Premium Member
Posts: 156
Joined: Fri Oct 28, 2005 1:00 am
Location: Melbourne,Australia

Pros and Cons of I/O

Post by bala_135 »

Hi All,

We are in the process of standardizing datastage design techniques.

When it comes to modularity, I/O issue was raised.
Can I get some inputs on what are the pros and cons of modular designing versus designing in one single job in terms of speed and resources.
Since modular designing involves staging in intermediate datasets,is that approach is the best compared to designing in one job.

Thanks in advance.

Regards,
Bala.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

This is a very open ended question. Similar issues are discussed several times before.

What is your requirement? There are many variables such as number of tables sourced from, transformation logic, space available, volume of data, machine capacity etc etc etc etc

So better do on a case-by-case basis.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I agree a little bit. The trade offs are in performance versus hardware. These always exist. There are also trade offs in stability and performance. If you have unlimited funds and need to get the data loaded as quick as possible then that may look different as far as the number of times you land the data. If you need to reuse your lookups then that makes datasets more important. If you have resources available on front or back end databases then you can also push work into these databases. That can change the look of your ETL. That maybe in the product soon. They demonstrated last year in IOD.
Mamu Kim
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

The other tradeoff can be I/O vs. recoverability. At one end of the spectrum you have the very large PX job that does one read of the data, lots of transformation and writes the end-result out to the target table. It typically has the best throughput, but it can also be the hardest job to restart and recover when something goes wrong.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I lump that into what I called stability. I agree totally.
Mamu Kim
bala_135
Premium Member
Premium Member
Posts: 156
Joined: Fri Oct 28, 2005 1:00 am
Location: Melbourne,Australia

Post by bala_135 »

Thanks Andy,kim

I don't see any open endness in the question.It was very specfic that staging the jobs in intermediate datasets allows modularity in design but resource usabilty,I/O speed need to be optimized.If anyone has done this optimization earlier are most welcome to share their views.

Andy:-I have got some inputs in your answer.
Throughput is more in designing in single jobs but recovery,maintainability,scalabilty are the potential issues.

Correct me if i am wrong.
Thanks for the inputs


Regards,
Bala.
Post Reply