Page 1 of 1

Datastage (Infosphere Information Server) Sizing

Posted: Mon Jun 09, 2014 12:55 pm
by nvalia
Hi All,

We are in the process of sizing a datastage environment.

What are the inputs needed that would help in arriving at a reasonably close hardware requirements estimation..No of Cores, Memory (I understand it cannot not be accurate but in the ballpark)

e.g
Data Volumes
Available Load Window within which we have to finish the cycle
...

Thanks,
VN

Posted: Mon Jun 09, 2014 1:47 pm
by chulett
You might find Eric's reply in this thread of interest.

Posted: Mon Jun 09, 2014 3:20 pm
by nvalia
To give a context - we are expecting about:

Annual Data Growth - 10-15%
15M rows to be processed daily
Medium level transformation rules complexity
Sources - Files and RDBMS
Target - Netezza

Posted: Mon Jun 09, 2014 5:26 pm
by ray.wurlod
By default no data are stored in DataStage, therefore data volumes do not affect sizing of DataStage itself. If you choose to use Data Sets for intermediate storage, then you need to allocate sufficient disk space for that. If you perform sorting, or lookups, etc., involving large data volumes then you need to ensure that you have plenty of scratch disk. And, of course, there must be sufficient space available in databases, including transaction logs (this is particularly true for Information Analyzer).

Posted: Mon Jun 09, 2014 9:09 pm
by nvalia
Thanks Ray..how about sizing the CPU (count) and Memory requirements..hence I mentioned the data volumes and the complexity of transformation

Posted: Tue Jun 10, 2014 8:13 am
by asorrell
Sizing the server is actually a very difficult thing to do. It's almost a "How high is Up?" kind of question because there are literally dozens of factors that come into play.

In many cases I've even seen hardware vendors fall back on "how much money do you have" to size the system because all they'll do is suggest the biggest system you can afford.

Can you engage your hardware vendor to assist?

Posted: Tue Jun 10, 2014 8:21 am
by chulett
Or engage IBM, as Eric suggested in the post I linked to.

Posted: Tue Jun 10, 2014 8:49 am
by asorrell
IBM will not be very helpful unless you are going to use AIX systems... For some reason they dislike being asked how to size their competitors hardware...

unless you pay them. Then they'll be quite helpful.

Posted: Thu Jun 12, 2014 12:22 pm
by PaulVL
You might want to decide if you want Windows, Linux or AIX. I would go with Linux, but you need to ask your system admins. If you are not an AIX shop, then it pretty much rules that out.

Next you will ask the same group: What is the minimum amount of Cores I can get one of those servers configured with?

I would not configure any datastage server with less than 32GB of Ram. That stuff is cheap and usefull so go big or go home. Install guide says something like 2GB per core... that's poop. Keep it big. It's not just the datastage jobs you have to deal with, it's the user written shell scripts that are part of your framework. Those can get ugly.


They will pitch you Virtual Servers... Cloud blah blah... You have to chose if you want your production box on a virtual or not.

15M rows with 10-15% growth anually is chump change.

Another question you have to ask your team is... once this DataStage environment kicks off and is a huge success... will others want to put their ETL process into this environment? Meaning... YOUR growth is expected and planned, but do you think others will want to go swimming into your pool too? Your company should try to avoid setting up a unique install for each project that pops up. To costly to do that, but at times politics trumps budget.

Posted: Thu Jun 12, 2014 2:59 pm
by qt_ky
asorrell wrote:IBM will not be very helpful unless you are going to use AIX systems...
I disagree. I searched some of my past Info Server sizing documents and found AIX was never even mentioned.