Page 1 of 1

DS Server - Benchmark

Posted: Wed Feb 02, 2005 8:25 am
by akrzy
HI,
Could someone tell me about performance of DataStage jobs.
I don't have a fast CPU and a lot of RAM so I can't test DataStage myself.

Can you share your experiance with me ?
And send me any Banchmark?

Thank in advance ,
ANka

Re: DS Server - Benchmark

Posted: Wed Feb 02, 2005 8:35 am
by ogmios
Why worry about performance and benchmarks of DataStage when you don't have DataStage. Performance varies widely on how you write your jobs and what the job actually has to accomplish.

For a comparison of Informatica and DataStage, both can do the same. Both have their weaknesses and strengths. Informatica is more expensive as DataStage. But don't expect to run DataStage on any UNIX server for less than 100K USD. I'm not familiar with the Windows pricing of DataStage.

Ogmios

Posted: Wed Feb 02, 2005 8:47 am
by akrzy
I use DataStage but on my own notebooks.
And I would like to know what is the performance on the really huge system with a lot of RAM and CPU.

Posted: Wed Feb 02, 2005 8:53 am
by chulett
Really fast. :wink:

Posted: Wed Feb 02, 2005 9:05 am
by kcbland
akrzy wrote:I use DataStage but on my own notebooks.
And I would like to know what is the performance on the really huge system with a lot of RAM and CPU.

You cannot even begin to write an equation that can take into consideration all of the variables involved. A Windoze PC cannot even compare to an 24 cpu SMP with an EMC disk farm and 48 GB of memory. What runs fast on a single-cpu Windoze server may be faster than it would be on a Unix box, but when you're running 24 copies/instances of that job on the Unix box blows away the Windoze.

So you see, you cannot compare the two environments, because things you can do sloppily on Windoze may not be forgiven on a Unix environment, and vice versa.

Posted: Wed Feb 02, 2005 9:10 am
by akrzy
OK, I know that I can't compare it, but I need some exapmles to show our client that he should use DataStage.

So , please If you have any benchmark (good benchmark ) perhaps you could tell me about the results.

Posted: Wed Feb 02, 2005 10:58 am
by ogmios
Should he use DataStage :lol:

For some sites I'm sure it would be much more cost effective to "Do-It-Yourself" in Perl or via Oracle warehouse builder, ... Do a ROI over a longer period e.g. 5 to 10 years or so and see where you end up.

There are no benchmarks for DataStage. Benchmarks require that you have some standard to run against. So you would have to have an identical task that can be performed.

To give you some examples: on a Sun server system loading a 120 character row length file to a DB2 database running on the same machine it goes up to 500 a 800 rows/s. Load the same file to a remote database and it drops to 200 a 400 rows.

You load/insert to Oracle on a nearby machine you get up to 1200 rows... you update data in Oracle and then you sometimes drop to 50 to 100 rows...

In general it depends on too many factors. Maybe it's an idea to create a standard problem to be solved and compare speeds between different setups/tools.

Ogmios

Posted: Wed Feb 02, 2005 3:07 pm
by ray.wurlod
Rows per second is meaningless except to compare different runs of the same job processing exactly the same data and provided that there is a substantial volume of data.

A job with 200-byte rows might achieve 3000 rows per second. The same job with 2000-byte rows would be expected to achieve about 300 rows per second. But the same volume of data has been processed.

There are some benchmarks published on the Ascential web site - search there for "benchmark".

DataStage is very scalable; you can, in general, throw more resources at it and get better "performance".

You should also be specific about what you mean by performance. In the ETL world, the key "performance" indicator is typically the ability to finish processing within a given time window, with a safety margin.

Posted: Tue May 16, 2006 6:35 am
by slinni
Is there any published literature or tips on the performance of DataStage on various hardware & OS platforms(linux vs. Solaris).

I understand that actual capacity planning will depend on expected load on the ETL server, etc. but if I know we want to stay on DataStage(Server edition 7.5.1a) will the hardware & OS brand make a difference?

Posted: Tue May 16, 2006 8:38 am
by raoraghunandan
slinni wrote:Is there any published literature or tips on the performance of DataStage on various hardware & OS platforms(linux vs. Solaris).

I understand that actual capacity planning will depend on expected load on the ETL server, etc. but if I know we want to stay on DataStage(Server edition 7.5.1a) will the hardware & OS brand make a difference?
To the best of my knowledge, there is no published literature for Datastage/ETL tool performance benchmarks. However, a paid performance benchmark service will certainly be available from some of the leading IT consultancy companies.

-Raghu

Posted: Tue May 16, 2006 8:58 am
by ArndW
Sizing systems and software has always been more of an arcane art than a science and with multiuser systems becoming increasingly complex. It is impossible to come up with a formula that is accurate enough to be usable.

Posted: Tue May 16, 2006 4:34 pm
by vmcburney
There are sites out there running DataStage on hundreds of servers in a grid. The speed and throughput depends on how many CPU, disk and RAM resources you throw at it and the bandwidth of your network and the performance of your source and target databases.