Installing Datastage 8.7 on Hadoop Cluster

pg_smh · Post by **pg_smh** » Tue Feb 17, 2015 6:59 pm

Hi,

I wanted to check if anyone has installed DS8.7 on a Hadoop cluster.

Can we put $DSHOME directory of HDFS and leverage the power of Hadoop's distributed computing.

I intend to have the datasets, source/target files on the hadoop cluster anyhow. Just wondering if putting $DSHOME on HDFS gets me any additional leverage, or does it become a bottleneck.

felixyong · Post by **felixyong** » Wed May 20, 2015 11:34 pm

Hi

DataStage (Parallel Jobs) is already based on "distributed" computing by itself as it is based on Shared Nothing Parallel Architecture.

We have been doing this even before Hadoop exist via MPP or GRID architecture.

Putting it on Hadoop will just make it slower since we need to go through another layer compare to what it already provided natively.

ray.wurlod · Post by **ray.wurlod** » Thu May 21, 2015 5:00 pm

This functionality will be available in the next version (the one after 11.3.1). The engine executes directly on the Hadoop cluster, with "grid management" handled through YARN. This technology is currently in beta testing, and seems to be standing up well.

I think that putting $DSHOME on one node of a HDFS cluster would be self-defeating; it would not get the three-way replication that other artifacts in the HDFS automatically get, so would not be insured against node failure.

Nor do I think it's a good idea to place Data Set storage on HDFS, for similar reasons.

And it's definitely not a good idea to place scratch disk there; scratch disk should always be on local disk if possible.

vmcburney · Post by **vmcburney** » Mon Jun 01, 2015 1:04 am

Agree with Ray - you should wait for the DataStage version that is certified to run on Hadoop. It will have a different license model, it will be much cheaper per processing node, and it will be running natively.

DataStage 11.3.1 has the DataStage Hadoop file stage so if you upgrade to that version you can at least start using it for source and target files.

pg_smh · Post by **pg_smh** » Tue Dec 01, 2015 3:27 pm

Hi,

Just wanted to check if anyone has had a chance to work with the latest Datastage offering DS 11.5 on the Hadoop cluster.

Just trying to assess how easy/difficult will it be to integrate it with an existing hadoop cluster and any references that anyone can share.

cyclogenisis · Post by **cyclogenisis** » Fri Dec 04, 2015 3:38 pm

I believe what you are looking for is BigIntegrate. It is the BigData version of DataStage using the DataStage engine, running natively on YARN.

DSXchange

Installing Datastage 8.7 on Hadoop Cluster

Installing Datastage 8.7 on Hadoop Cluster

Re: Installing Datastage 8.7 on Hadoop Cluster

DS 11.5 certified to run on Hadoop