Page 1 of 1

Buy AIX server for Datastage Dev, Test and Prod environment

Posted: Sun Sep 24, 2006 11:30 am
by avi21st
Hi

Before going to the question I want to provide some details of our present system architecture- Now we have some questions as we plan to buy new AIX servers for test and production environments for our ETL codes
(Datastage and Business Objects). Previously we had AS400 machines handling all IT systems. ETL concept is new in our organization and I am trying to implement it. Though the databases we plan to keep in AS400 servers, we plan to have Datastage Server Edition and Business Objects dev, test and prod environment in AIX. Currently we have only a development environment in AIX where currenly four projects are in development mode. The current system is a 2 CPU , 60 GB system for initial implementation.

I have got some information from Datastage guide and on that basis we have our temporary development environment.

My question:
For each environment say Development, Test and Prod how to determine the following:

Code: Select all

1> How many CPU we should have for each environment

2> How many nodes we need

3> How much diskspace - temp, free space etc for each environment

4> What other components we need to think of while buying a AIX server for Datastage
Please provide your valuable suggestions

Posted: Sun Sep 24, 2006 11:43 am
by ArndW
The scaling of the new AIX machines is a complex matter and the information given so far isn't quite enough to come close. The actual load that production is going to have plays a very big role and you haven't touched upon that issue yet.

If you go for the P-Series architecture you can scale the number of CPU's in use between the different logical "nodes" within the same frame and dynamically shift these processors around in realtime. To limited degree you can do the same with memory.

Without any additional information you could take your current speeds on the AS400 machines and use them as a baseline; i.e. saying you will execute 40 jobs daily for n-Gb of data or n-Million rows and on the current hardware it will take x-hours to run. Then you need to find out what your acceptable runtimes in production are and use MIPS comparative numbers between your AS400 and future AIX systems to come up with a factor that should, theoretically, give you expected CPU and Memory and perhaps even disk capacities.

In reality none of this is going to even be close, but it will give you a ballpark figure to start with. Realistically your only hope of sizing will be to get a test machine from IBM or use one of their test centers (I don't know where the US ones are located, but there should be one relatively close by). That is the only way you will get semi-reliable sizings.

If you don't have your DataStage ETL in production yet you will need to do quite a bit more testing and gathering of specifications before continuing.

Posted: Sun Sep 24, 2006 4:02 pm
by ray.wurlod
Just to back up what Arnd said, there is no point using benchmark information other than for your own ETL jobs. No-one else does precisely what you do with your data. Using an IBM test center, or borrowing a machine on which to conduct some benchmarks, is a good plan.

Posted: Sun Sep 24, 2006 4:36 pm
by vmcburney
In your source and target databases you tend to store data just once. On your ETL server your store it many times, depending on how many rollback points you have, whether you have a dataset staging area, how often you archive etc. Data saved to datasets tends to take up more space than database tables as you duplicate reference data across nodes. If you plan on doing full table loads on a regular basis your ETL server needs to have several times more disk space than your source or target databases. If you are keeping to delta loads you can reduce this somewhat.

It is a good idea to get a lot of RAM. When you add a CPU you add a DataStage license fee. When you add RAM you add no extra licensing fees, so RAM is a cheap way of getting more performance out of a server.

Why buy Unix server over Windows server for Datastage

Posted: Mon Sep 25, 2006 10:58 am
by avi21st
Thanks for all your suggestions.

All the projects that I have worked used AIX or Solaris i.e. UNIX based server for DataStage. Now my boss is asking me a question:

Why we do not use a Windows Server for DataStage implementation. What is the advantage of Datastage on UNIX over DataStage on Windows.

We have also a very important decision to take and I would request your help.

We have the current structure like the following in the development environment on AIX. We are planning to buy servers for Test and Production and so they are thinking to move everything to Windows server.

My opinion is: First our license in for AIX not Windows. Secondly we do not have a Datastage / ETL scheduling tool- so I am using Unix shell based scheduling and automation for the project. And the last point-as I am designing the ETL process and scheduling - I have never worked on Windows based .bat scripts :lol: - kiddin.

Anyway how I planned the architecture is given by the figure below.

Please give me some valid reasons why most people use Unix as the server for Datastage ETL implementation.

Is there are example of people having Datastage on Windows moving to Datastage on Unix


This is what I told them. The Automation and Exception Management (AEM) strategy which I have designed is also Unix shell based.

Image

Posted: Mon Sep 25, 2006 3:43 pm
by ray.wurlod
In a word: robustness.

Windows needs to be re-booted too frequently (in my opinion) to be considered in a production environment. UNIX rarely needs to be re-booted - primarily after upgrading of hardware or operating system.