Page 1 of 1

AIX vs Linux DataStage performance

Posted: Thu Apr 27, 2017 10:02 am
by thompsonp
I've used different versions of DataStage on both AIX and Linux and Windows over the years.
However I've never done any comparison of the relative performance on different platforms.
The underlying hardware is likely to be different for AIX and Linux (although you could run Linux or AIX on an IBM Power CPU box).
I know real world performance will depend on lots of things, not least the specifics of what our ETL jobs do and what they are connecting with, but does anyone have any metrics to compare performance between the platforms when using DataStage?

What I am really interested in finding out is the relative performance against cost. Not only is the hardware priced quite differently but the PVU licensing costs seem to be weighted in favour of running Power8 (or previous Power generation) CPUs. For example I've seen a 4 cpu, 24 core Power8 compare favourably to a 4 cpu, 70 core Xeon box (although DataStage wasn't being used).
The PVU difference between the two is significant.
The PVU of a Power system running Linux is also significantly lower than one running AIX making me wonder if there is a significant performance disadvantage of doing so.

Any thoughts or real world comparisons welcome please.

Posted: Thu Apr 27, 2017 2:41 pm
by qt_ky
It would be interesting to see some benchmarks like that.

The IBM PVU calculator does have a drop-down choice for Linux on any POWER system with a ratio of 70 value units per core. That matches up with some of the other POWER8 server models that also have a ratio of 70 which could also run AIX. Yet there are still other POWER models with ratios of 80, 100, or 120. Perhaps it represents some sort of Linux discount?

Posted: Thu Apr 27, 2017 5:28 pm
by rkashyap
Another criteria to think about is availability of features and support.

There are variations in DataStage features based on operating system e.g. BDFS was initially launched only for linux and about 1+ year later made available for AIX and that too for connectivity to BigInsights only.
These variations are not really highlighted and are easy to overlook. Do ask these questions to ensure that DataStage AIX meets your business needs.

Posted: Fri Apr 28, 2017 12:44 pm
by PaulVL
It's been my experience that the performance of a datastage job has not been throttled by CPU but by IO to and from your data sources.

DISK, DataBase, or SFTP... that has mainly been the deciding factor on job speed. Poor job design as well.


I personally prefer RHEL.

Posted: Fri May 05, 2017 2:44 am
by thompsonp
Thanks for your thoughts.
I've spent a while trying to find some metrics / benchmarks / comparisons without much success.
I did find a paper where Intel and IBM ran some tests to show the overhead for running in a virtual environment - 5 to 10% drop over the physical server.

I've still not been able to find a comparison between AIX and Linux - though I am waiting to see if IBM have any data available to help size a new Linux installation. In the past they have sized a server based on the volume of input data and an assumption about how that grows through the ETL e.g. a 1GB file may generate 3GB of interim data before being loaded.

So let me open up the question and ask if anyone has any metrics for their DataStage installs running on Linux? What server specs do you have and how much data do you process an hour for example?