Buy AIX server for Datastage Dev, Test and Prod environment

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
avi21st
Charter Member
Charter Member
Posts: 135
Joined: Thu May 26, 2005 10:21 am
Location: USA

Buy AIX server for Datastage Dev, Test and Prod environment

Post by avi21st »

Hi

Before going to the question I want to provide some details of our present system architecture- Now we have some questions as we plan to buy new AIX servers for test and production environments for our ETL codes
(Datastage and Business Objects). Previously we had AS400 machines handling all IT systems. ETL concept is new in our organization and I am trying to implement it. Though the databases we plan to keep in AS400 servers, we plan to have Datastage Server Edition and Business Objects dev, test and prod environment in AIX. Currently we have only a development environment in AIX where currenly four projects are in development mode. The current system is a 2 CPU , 60 GB system for initial implementation.

I have got some information from Datastage guide and on that basis we have our temporary development environment.

My question:
For each environment say Development, Test and Prod how to determine the following:

Code: Select all

1> How many CPU we should have for each environment

2> How many nodes we need

3> How much diskspace - temp, free space etc for each environment

4> What other components we need to think of while buying a AIX server for Datastage
Please provide your valuable suggestions
Avishek Mukherjee
Data Integration Architect
Chicago, IL, USA.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The scaling of the new AIX machines is a complex matter and the information given so far isn't quite enough to come close. The actual load that production is going to have plays a very big role and you haven't touched upon that issue yet.

If you go for the P-Series architecture you can scale the number of CPU's in use between the different logical "nodes" within the same frame and dynamically shift these processors around in realtime. To limited degree you can do the same with memory.

Without any additional information you could take your current speeds on the AS400 machines and use them as a baseline; i.e. saying you will execute 40 jobs daily for n-Gb of data or n-Million rows and on the current hardware it will take x-hours to run. Then you need to find out what your acceptable runtimes in production are and use MIPS comparative numbers between your AS400 and future AIX systems to come up with a factor that should, theoretically, give you expected CPU and Memory and perhaps even disk capacities.

In reality none of this is going to even be close, but it will give you a ballpark figure to start with. Realistically your only hope of sizing will be to get a test machine from IBM or use one of their test centers (I don't know where the US ones are located, but there should be one relatively close by). That is the only way you will get semi-reliable sizings.

If you don't have your DataStage ETL in production yet you will need to do quite a bit more testing and gathering of specifications before continuing.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Just to back up what Arnd said, there is no point using benchmark information other than for your own ETL jobs. No-one else does precisely what you do with your data. Using an IBM test center, or borrowing a machine on which to conduct some benchmarks, is a good plan.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

In your source and target databases you tend to store data just once. On your ETL server your store it many times, depending on how many rollback points you have, whether you have a dataset staging area, how often you archive etc. Data saved to datasets tends to take up more space than database tables as you duplicate reference data across nodes. If you plan on doing full table loads on a regular basis your ETL server needs to have several times more disk space than your source or target databases. If you are keeping to delta loads you can reduce this somewhat.

It is a good idea to get a lot of RAM. When you add a CPU you add a DataStage license fee. When you add RAM you add no extra licensing fees, so RAM is a cheap way of getting more performance out of a server.
avi21st
Charter Member
Charter Member
Posts: 135
Joined: Thu May 26, 2005 10:21 am
Location: USA

Why buy Unix server over Windows server for Datastage

Post by avi21st »

Thanks for all your suggestions.

All the projects that I have worked used AIX or Solaris i.e. UNIX based server for DataStage. Now my boss is asking me a question:

Why we do not use a Windows Server for DataStage implementation. What is the advantage of Datastage on UNIX over DataStage on Windows.

We have also a very important decision to take and I would request your help.

We have the current structure like the following in the development environment on AIX. We are planning to buy servers for Test and Production and so they are thinking to move everything to Windows server.

My opinion is: First our license in for AIX not Windows. Secondly we do not have a Datastage / ETL scheduling tool- so I am using Unix shell based scheduling and automation for the project. And the last point-as I am designing the ETL process and scheduling - I have never worked on Windows based .bat scripts :lol: - kiddin.

Anyway how I planned the architecture is given by the figure below.

Please give me some valid reasons why most people use Unix as the server for Datastage ETL implementation.

Is there are example of people having Datastage on Windows moving to Datastage on Unix


This is what I told them. The Automation and Exception Management (AEM) strategy which I have designed is also Unix shell based.

Image
Avishek Mukherjee
Data Integration Architect
Chicago, IL, USA.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In a word: robustness.

Windows needs to be re-booted too frequently (in my opinion) to be considered in a production environment. UNIX rarely needs to be re-booted - primarily after upgrading of hardware or operating system.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply