DS Server - Benchmark

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
akrzy
Participant
Posts: 121
Joined: Wed Dec 08, 2004 4:46 am

DS Server - Benchmark

Post by akrzy »

HI,
Could someone tell me about performance of DataStage jobs.
I don't have a fast CPU and a lot of RAM so I can't test DataStage myself.

Can you share your experiance with me ?
And send me any Banchmark?

Thank in advance ,
ANka
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: DS Server - Benchmark

Post by ogmios »

Why worry about performance and benchmarks of DataStage when you don't have DataStage. Performance varies widely on how you write your jobs and what the job actually has to accomplish.

For a comparison of Informatica and DataStage, both can do the same. Both have their weaknesses and strengths. Informatica is more expensive as DataStage. But don't expect to run DataStage on any UNIX server for less than 100K USD. I'm not familiar with the Windows pricing of DataStage.

Ogmios
In theory there's no difference between theory and practice. In practice there is.
akrzy
Participant
Posts: 121
Joined: Wed Dec 08, 2004 4:46 am

Post by akrzy »

I use DataStage but on my own notebooks.
And I would like to know what is the performance on the really huge system with a lot of RAM and CPU.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Really fast. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

akrzy wrote:I use DataStage but on my own notebooks.
And I would like to know what is the performance on the really huge system with a lot of RAM and CPU.

You cannot even begin to write an equation that can take into consideration all of the variables involved. A Windoze PC cannot even compare to an 24 cpu SMP with an EMC disk farm and 48 GB of memory. What runs fast on a single-cpu Windoze server may be faster than it would be on a Unix box, but when you're running 24 copies/instances of that job on the Unix box blows away the Windoze.

So you see, you cannot compare the two environments, because things you can do sloppily on Windoze may not be forgiven on a Unix environment, and vice versa.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
akrzy
Participant
Posts: 121
Joined: Wed Dec 08, 2004 4:46 am

Post by akrzy »

OK, I know that I can't compare it, but I need some exapmles to show our client that he should use DataStage.

So , please If you have any benchmark (good benchmark ) perhaps you could tell me about the results.
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

Should he use DataStage :lol:

For some sites I'm sure it would be much more cost effective to "Do-It-Yourself" in Perl or via Oracle warehouse builder, ... Do a ROI over a longer period e.g. 5 to 10 years or so and see where you end up.

There are no benchmarks for DataStage. Benchmarks require that you have some standard to run against. So you would have to have an identical task that can be performed.

To give you some examples: on a Sun server system loading a 120 character row length file to a DB2 database running on the same machine it goes up to 500 a 800 rows/s. Load the same file to a remote database and it drops to 200 a 400 rows.

You load/insert to Oracle on a nearby machine you get up to 1200 rows... you update data in Oracle and then you sometimes drop to 50 to 100 rows...

In general it depends on too many factors. Maybe it's an idea to create a standard problem to be solved and compare speeds between different setups/tools.

Ogmios
In theory there's no difference between theory and practice. In practice there is.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Rows per second is meaningless except to compare different runs of the same job processing exactly the same data and provided that there is a substantial volume of data.

A job with 200-byte rows might achieve 3000 rows per second. The same job with 2000-byte rows would be expected to achieve about 300 rows per second. But the same volume of data has been processed.

There are some benchmarks published on the Ascential web site - search there for "benchmark".

DataStage is very scalable; you can, in general, throw more resources at it and get better "performance".

You should also be specific about what you mean by performance. In the ETL world, the key "performance" indicator is typically the ability to finish processing within a given time window, with a safety margin.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
slinni
Premium Member
Premium Member
Posts: 5
Joined: Thu Oct 13, 2005 5:24 am

Post by slinni »

Is there any published literature or tips on the performance of DataStage on various hardware & OS platforms(linux vs. Solaris).

I understand that actual capacity planning will depend on expected load on the ETL server, etc. but if I know we want to stay on DataStage(Server edition 7.5.1a) will the hardware & OS brand make a difference?
raoraghunandan
Charter Member
Charter Member
Posts: 19
Joined: Sun Jul 20, 2003 4:29 am

Post by raoraghunandan »

slinni wrote:Is there any published literature or tips on the performance of DataStage on various hardware & OS platforms(linux vs. Solaris).

I understand that actual capacity planning will depend on expected load on the ETL server, etc. but if I know we want to stay on DataStage(Server edition 7.5.1a) will the hardware & OS brand make a difference?
To the best of my knowledge, there is no published literature for Datastage/ETL tool performance benchmarks. However, a paid performance benchmark service will certainly be available from some of the leading IT consultancy companies.

-Raghu
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sizing systems and software has always been more of an arcane art than a science and with multiuser systems becoming increasingly complex. It is impossible to come up with a formula that is accurate enough to be usable.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

There are sites out there running DataStage on hundreds of servers in a grid. The speed and throughput depends on how many CPU, disk and RAM resources you throw at it and the bandwidth of your network and the performance of your source and target databases.
Post Reply