Installing Datastage 8.7 on Hadoop Cluster

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
pg_smh
Participant
Posts: 15
Joined: Sat Feb 10, 2007 3:31 am

Installing Datastage 8.7 on Hadoop Cluster

Post by pg_smh »

Hi,

I wanted to check if anyone has installed DS8.7 on a Hadoop cluster.

Can we put $DSHOME directory of HDFS and leverage the power of Hadoop's distributed computing.

I intend to have the datasets, source/target files on the hadoop cluster anyhow. Just wondering if putting $DSHOME on HDFS gets me any additional leverage, or does it become a bottleneck.
felixyong
Participant
Posts: 35
Joined: Tue Jul 22, 2003 7:24 pm
Location: Australia

Re: Installing Datastage 8.7 on Hadoop Cluster

Post by felixyong »

Hi

DataStage (Parallel Jobs) is already based on "distributed" computing by itself as it is based on Shared Nothing Parallel Architecture.

We have been doing this even before Hadoop exist via MPP or GRID architecture.

Putting it on Hadoop will just make it slower since we need to go through another layer compare to what it already provided natively.
Regards
Felix
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This functionality will be available in the next version (the one after 11.3.1). The engine executes directly on the Hadoop cluster, with "grid management" handled through YARN. This technology is currently in beta testing, and seems to be standing up well.

I think that putting $DSHOME on one node of a HDFS cluster would be self-defeating; it would not get the three-way replication that other artifacts in the HDFS automatically get, so would not be insured against node failure.

Nor do I think it's a good idea to place Data Set storage on HDFS, for similar reasons.

And it's definitely not a good idea to place scratch disk there; scratch disk should always be on local disk if possible.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Agree with Ray - you should wait for the DataStage version that is certified to run on Hadoop. It will have a different license model, it will be much cheaper per processing node, and it will be running natively.

DataStage 11.3.1 has the DataStage Hadoop file stage so if you upgrade to that version you can at least start using it for source and target files.
pg_smh
Participant
Posts: 15
Joined: Sat Feb 10, 2007 3:31 am

DS 11.5 certified to run on Hadoop

Post by pg_smh »

Hi,

Just wanted to check if anyone has had a chance to work with the latest Datastage offering DS 11.5 on the Hadoop cluster.

Just trying to assess how easy/difficult will it be to integrate it with an existing hadoop cluster and any references that anyone can share.
cyclogenisis
Premium Member
Premium Member
Posts: 48
Joined: Wed Jan 07, 2015 3:30 pm

Post by cyclogenisis »

I believe what you are looking for is BigIntegrate. It is the BigData version of DataStage using the DataStage engine, running natively on YARN.
Post Reply