CPU requirements DataStage 8.5

coface · Post by **coface** » Tue Oct 18, 2011 1:19 am

Hi,
I'm looking for a document which describes the CPU-requirements for DataStage 8.5.
We have installed DataStage on a virtual machine with Linux Enterprise Edition (Red Hat) as the operating system. The CPU's are Intel based (XEON). The first test with a one dual core CPU ended up in the situation that the client couldn't connect to the server anymore (e.g. with DataStage Director) since the machine was 100% bound. The next test with two dual core CPU's looked alot better.
I have found a document which describes the network, disk, memory and os requirements - but this document doesn't have any information about "how many cpu's would be best".
Maybe this information is documented somewhere.

Any help will be appreciated
Rgds
JH

chulett · Post by **chulett** » Tue Oct 18, 2011 6:39 am

There isn't any such document although there may be anecdotal evidence from people here. CPU would fall into what Ray has called the "more is better" camp, I would imagine, much like disk and memory but I've never seen any sort of minimum or recommended CPU listed.

coface · Post by **coface** » Tue Oct 18, 2011 7:22 am

That's what I was worried about

I certainly had the hope that IBM has documented the CPU requirements (as Oracle does for it's database software). The license costs depend on the number of CPU's - now it's up to the enduser to figure out how many CPU's at least are needed. Pretty easy to stick with "more is better" since the user has to pay for that

ray.wurlod · Post by **ray.wurlod** » Tue Oct 18, 2011 3:35 pm

That's because CPU really is an "it depends" factor. If no jobs were running your clients would have no trouble connecting. Why would you expect IBM to be able to predict the load that you propose to put on the servers? It's just not possible. The equation is simply supply and demand, as with all computer tuning. If demand increases and there's no surplus supply, then supply needs to increase too, otherwise there's a penalty.

coface · Post by **coface** » Wed Oct 19, 2011 5:25 am

I know that this is some kind of a difficult discussion since it depends on the view of opinion, too. I agree that IBM will not be able to know how much load each customer will put on the servers. But I expected some kind of recommendations like "when running x-parallel jobs on the machine x CPU's will be best". Of course it depends on the "what the jobs will do" and that's certainly pretty difficult to define. But some kind of statistics, examples, benchmarks should be available, that's what I expect from such an exepensive software...

eostic · Post by **eostic** » Wed Oct 19, 2011 7:05 am

Your sales team has access to a sizing tool, but like anything, it produces a "swag" --- a 'guestimate' of what you might need..... it take a lot of things into account, as you should, such as the expected number of developers, the number of "concurrent" jobs you might want to run, the quantity of data and thoughts on the processing time required and the degree of parallelism that is required, etc. etc. etc.

...and such tools are just "estimates".....only a far more time consuming review of your environment and requirements is going to yield something more serious.

The best "gut feel guideline" I've ever heard from one of our deep parallel experts is to try and have, during worst case, heavy-major-duty-processing, no more than 10 osh processes per core. Then consider the number of typical osh processes being spawned by your EE Jobs, the number of Jobs that you might be concurrently running (or testing in dev), the degree of parallelism that you are requesting, the number of "instances" of any multi-instance, Jobs, etc. Of course, that's the worst case, and your system will probably function fine with more osh processes per core than that, and certainly with less, and the types of transformation, I/O and other work you are doing will also have an impact.

There aren't any magic bullets here....and while its an entirely different domain, I'm sure a deep Oracle expert would say the same about the estimates they provide for rdbms.

Ernie

coface · Post by **coface** » Wed Oct 19, 2011 7:40 am

I agree: it just depends...
And regrading Oracle: Oracle provides several benchmarks, white papers about HW requirements. The Oracle docs provide a recommendation how much CPU, RAM etc. at least is needed...
Many thanks to all which have discussed this quetsion with me. The point with "no more than 10 osh processes per core" sounds pretty helpful. Will will use this measurement for the coming stress tests...