High Availability

thompsonp · Post by **thompsonp** » Wed Mar 22, 2006 6:14 am

I have searched the Ascential / IBM site, as well as the forums and I can't find any information about how Datastage can be used in a high availability scenario.

I believe there are some additional products / licences required for Datastage to run with some kind of failover capability. Can anyone shed any light on this, or better still has anyone built such a solution?

Thanks
Paul

chulett · Post by **chulett** » Wed Mar 22, 2006 6:22 am

Searching for 'failover' got 27 hits, not all relevant of course, but ones like this one is. It discusses failover, something I was able to help setup at one site.

There's nothing built into DataStage to make it qualify as 'Highly Available' or provide anything like transparent failover. And no additional products or licenses will make it happen that I am aware of.

ArndW · Post by **ArndW** » Wed Mar 22, 2006 7:39 am

I did a couple of HACMP and other automated failover mechanism projects with the underlying database system and none of the solutions we ever came up with was truly 0-downtime and invisible to the users. Trying to make a DataStage project runtime environment failover gracefully is going to be a lot of work. Since the DWH environment is usually based around discrete load jobs there is usually no need to go beyond the job level of granularity.

seanc217 · Post by **seanc217** » Wed Mar 22, 2006 2:22 pm

We are investigating this too.
Talking to IBM, they offer a somewhat way of doing things utilizing a linux grid. For initial setup they send someone out from professional services and they setup the hardware and supply you with a "grid toolkit" I still do not know everything that is involved yet. As we travel down this road I will post updates as I can.

vmcburney · Post by **vmcburney** » Wed Mar 22, 2006 2:37 pm

There have been high availability sessions at the last two conferences. It centres around the RTI pack which can automatically allocate jobs to available servers. If one server goes down or is fully loaded it allocates the job to another server. It does come at an extra cost, though I have no idea how much the RTI pack costs. You can SOA enable either a server or parallel job for high availability.

I have built an automatic job retry for DataStage sequences where jobs that abort retry a given number of times. This only works for jobs that do not risk duplicate database changes. By banding the jobs into a prepare side and a load side we can make the prepare side jobs, which do 90% of the transformation, fully restartable and recoverable. The first time I did it the support staff received text messages 24x7 indicating restarts were being attempted and whether they had been successful. That was an early version of DataStage that often had unexpected aborts.

seanc217 · Post by **seanc217** » Thu Mar 23, 2006 1:34 pm

I actually have a copy of a power point presentation on this. I got it from someone at IBM. If you are interested, let me know and I can send it to you.

chulett · Post by **chulett** » Thu Mar 23, 2006 1:52 pm

vmcburney wrote:There have been high availability sessions at the last two conferences. It centres around the RTI pack which can automatically allocate jobs to available servers. If one server goes down or is fully loaded it allocates the job to another server. It does come at an extra cost, though I have no idea how much the RTI pack costs. You can SOA enable either a server or parallel job for high availability.

Make sure you understand how this all works. This 'high availability' is only for 'SOA enabled' jobs - and the vast majority of what you would have out there in the wild now is not 'SOA enable'able, either at all or easily.

The load balancing and failover only applies to the RTI Server component - the WebSphere/WebLogic/JBoss piece that handles the front end. Behind the scenes is still a plain old DataStage server running an extra little piece called the RTI Agent. When it goes down, either DataStage or the RTI Agent, it goes down and nothing on the RTI side can prevent that. What you need to have any kind of failover on the backend are multiple running DataStage servers for RTI to load balance across. And the failover is not transparent.

Just wanted to make those points before someone thought of RTI as some kind of HA Silver Bullet.

battaliou · Post by **battaliou** » Fri Jun 27, 2008 5:11 am

I dont know why you'd want to install on HACMP, but if you do, the process on v8 is as follows:

login to master node which should be pointing to node 1, change the hostname to nodeMaster using smit
do a standard install of datastage as per documented instructions and then change hostname back to node1
Do a global replace in serverindex.xml file from host="node1" to host="nodeMaster" (about 5 entries)
On node2 include the rpc services entry in /etc/services and create subdirectory /tmp/rt, setup UNIX dsadm user. Create /.dshome entry
Configure startup scripts to change default.apt or any required apt files to use correct node i.e. fastname="node1" or fastname="node2"

Swapping nodes involves sourcing the dsenv environment, stopping, then starting datastage services.