I have searched the Ascential / IBM site, as well as the forums and I can't find any information about how Datastage can be used in a high availability scenario.
I believe there are some additional products / licences required for Datastage to run with some kind of failover capability. Can anyone shed any light on this, or better still has anyone built such a solution?
Thanks
Paul
High Availability
Moderators: chulett, rschirm, roy
Searching for 'failover' got 27 hits, not all relevant of course, but ones like this one is. It discusses failover, something I was able to help setup at one site.
There's nothing built into DataStage to make it qualify as 'Highly Available' or provide anything like transparent failover. And no additional products or licenses will make it happen that I am aware of.
There's nothing built into DataStage to make it qualify as 'Highly Available' or provide anything like transparent failover. And no additional products or licenses will make it happen that I am aware of.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
I did a couple of HACMP and other automated failover mechanism projects with the underlying database system and none of the solutions we ever came up with was truly 0-downtime and invisible to the users. Trying to make a DataStage project runtime environment failover gracefully is going to be a lot of work. Since the DWH environment is usually based around discrete load jobs there is usually no need to go beyond the job level of granularity.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
We are investigating this too.
Talking to IBM, they offer a somewhat way of doing things utilizing a linux grid. For initial setup they send someone out from professional services and they setup the hardware and supply you with a "grid toolkit" I still do not know everything that is involved yet. As we travel down this road I will post updates as I can.
Talking to IBM, they offer a somewhat way of doing things utilizing a linux grid. For initial setup they send someone out from professional services and they setup the hardware and supply you with a "grid toolkit" I still do not know everything that is involved yet. As we travel down this road I will post updates as I can.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
There have been high availability sessions at the last two conferences. It centres around the RTI pack which can automatically allocate jobs to available servers. If one server goes down or is fully loaded it allocates the job to another server. It does come at an extra cost, though I have no idea how much the RTI pack costs. You can SOA enable either a server or parallel job for high availability.
I have built an automatic job retry for DataStage sequences where jobs that abort retry a given number of times. This only works for jobs that do not risk duplicate database changes. By banding the jobs into a prepare side and a load side we can make the prepare side jobs, which do 90% of the transformation, fully restartable and recoverable. The first time I did it the support staff received text messages 24x7 indicating restarts were being attempted and whether they had been successful. That was an early version of DataStage that often had unexpected aborts.
I have built an automatic job retry for DataStage sequences where jobs that abort retry a given number of times. This only works for jobs that do not risk duplicate database changes. By banding the jobs into a prepare side and a load side we can make the prepare side jobs, which do 90% of the transformation, fully restartable and recoverable. The first time I did it the support staff received text messages 24x7 indicating restarts were being attempted and whether they had been successful. That was an early version of DataStage that often had unexpected aborts.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Make sure you understand how this all works. This 'high availability' is only for 'SOA enabled' jobs - and the vast majority of what you would have out there in the wild now is not 'SOA enable'able, either at all or easily.vmcburney wrote:There have been high availability sessions at the last two conferences. It centres around the RTI pack which can automatically allocate jobs to available servers. If one server goes down or is fully loaded it allocates the job to another server. It does come at an extra cost, though I have no idea how much the RTI pack costs. You can SOA enable either a server or parallel job for high availability.
The load balancing and failover only applies to the RTI Server component - the WebSphere/WebLogic/JBoss piece that handles the front end. Behind the scenes is still a plain old DataStage server running an extra little piece called the RTI Agent. When it goes down, either DataStage or the RTI Agent, it goes down and nothing on the RTI side can prevent that. What you need to have any kind of failover on the backend are multiple running DataStage servers for RTI to load balance across. And the failover is not transparent.
Just wanted to make those points before someone thought of RTI as some kind of HA Silver Bullet.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
I dont know why you'd want to install on HACMP, but if you do, the process on v8 is as follows:
login to master node which should be pointing to node 1, change the hostname to nodeMaster using smit
do a standard install of datastage as per documented instructions and then change hostname back to node1
Do a global replace in serverindex.xml file from host="node1" to host="nodeMaster" (about 5 entries)
On node2 include the rpc services entry in /etc/services and create subdirectory /tmp/rt, setup UNIX dsadm user. Create /.dshome entry
Configure startup scripts to change default.apt or any required apt files to use correct node i.e. fastname="node1" or fastname="node2"
Swapping nodes involves sourcing the dsenv environment, stopping, then starting datastage services.
login to master node which should be pointing to node 1, change the hostname to nodeMaster using smit
do a standard install of datastage as per documented instructions and then change hostname back to node1
Do a global replace in serverindex.xml file from host="node1" to host="nodeMaster" (about 5 entries)
On node2 include the rpc services entry in /etc/services and create subdirectory /tmp/rt, setup UNIX dsadm user. Create /.dshome entry
Configure startup scripts to change default.apt or any required apt files to use correct node i.e. fastname="node1" or fastname="node2"
Swapping nodes involves sourcing the dsenv environment, stopping, then starting datastage services.
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.