Datastage (Infosphere Information Server) Sizing

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
nvalia
Premium Member
Premium Member
Posts: 180
Joined: Thu May 26, 2005 6:44 am

Datastage (Infosphere Information Server) Sizing

Post by nvalia »

Hi All,

We are in the process of sizing a datastage environment.

What are the inputs needed that would help in arriving at a reasonably close hardware requirements estimation..No of Cores, Memory (I understand it cannot not be accurate but in the ballpark)

e.g
Data Volumes
Available Load Window within which we have to finish the cycle
...

Thanks,
VN
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You might find Eric's reply in this thread of interest.
-craig

"You can never have too many knives" -- Logan Nine Fingers
nvalia
Premium Member
Premium Member
Posts: 180
Joined: Thu May 26, 2005 6:44 am

Post by nvalia »

To give a context - we are expecting about:

Annual Data Growth - 10-15%
15M rows to be processed daily
Medium level transformation rules complexity
Sources - Files and RDBMS
Target - Netezza
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

By default no data are stored in DataStage, therefore data volumes do not affect sizing of DataStage itself. If you choose to use Data Sets for intermediate storage, then you need to allocate sufficient disk space for that. If you perform sorting, or lookups, etc., involving large data volumes then you need to ensure that you have plenty of scratch disk. And, of course, there must be sufficient space available in databases, including transaction logs (this is particularly true for Information Analyzer).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nvalia
Premium Member
Premium Member
Posts: 180
Joined: Thu May 26, 2005 6:44 am

Post by nvalia »

Thanks Ray..how about sizing the CPU (count) and Memory requirements..hence I mentioned the data volumes and the complexity of transformation
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Sizing the server is actually a very difficult thing to do. It's almost a "How high is Up?" kind of question because there are literally dozens of factors that come into play.

In many cases I've even seen hardware vendors fall back on "how much money do you have" to size the system because all they'll do is suggest the biggest system you can afford.

Can you engage your hardware vendor to assist?
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Or engage IBM, as Eric suggested in the post I linked to.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

IBM will not be very helpful unless you are going to use AIX systems... For some reason they dislike being asked how to size their competitors hardware...

unless you pay them. Then they'll be quite helpful.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

You might want to decide if you want Windows, Linux or AIX. I would go with Linux, but you need to ask your system admins. If you are not an AIX shop, then it pretty much rules that out.

Next you will ask the same group: What is the minimum amount of Cores I can get one of those servers configured with?

I would not configure any datastage server with less than 32GB of Ram. That stuff is cheap and usefull so go big or go home. Install guide says something like 2GB per core... that's poop. Keep it big. It's not just the datastage jobs you have to deal with, it's the user written shell scripts that are part of your framework. Those can get ugly.


They will pitch you Virtual Servers... Cloud blah blah... You have to chose if you want your production box on a virtual or not.

15M rows with 10-15% growth anually is chump change.

Another question you have to ask your team is... once this DataStage environment kicks off and is a huge success... will others want to put their ETL process into this environment? Meaning... YOUR growth is expected and planned, but do you think others will want to go swimming into your pool too? Your company should try to avoid setting up a unique install for each project that pops up. To costly to do that, but at times politics trumps budget.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

asorrell wrote:IBM will not be very helpful unless you are going to use AIX systems...
I disagree. I searched some of my past Info Server sizing documents and found AIX was never even mentioned.
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply