Page 1 of 1

Installing Datastage 8.7 on Hadoop Cluster

Posted: Tue Feb 17, 2015 6:59 pm
by pg_smh
Hi,

I wanted to check if anyone has installed DS8.7 on a Hadoop cluster.

Can we put $DSHOME directory of HDFS and leverage the power of Hadoop's distributed computing.

I intend to have the datasets, source/target files on the hadoop cluster anyhow. Just wondering if putting $DSHOME on HDFS gets me any additional leverage, or does it become a bottleneck.

Re: Installing Datastage 8.7 on Hadoop Cluster

Posted: Wed May 20, 2015 11:34 pm
by felixyong
Hi

DataStage (Parallel Jobs) is already based on "distributed" computing by itself as it is based on Shared Nothing Parallel Architecture.

We have been doing this even before Hadoop exist via MPP or GRID architecture.

Putting it on Hadoop will just make it slower since we need to go through another layer compare to what it already provided natively.

Posted: Thu May 21, 2015 5:00 pm
by ray.wurlod
This functionality will be available in the next version (the one after 11.3.1). The engine executes directly on the Hadoop cluster, with "grid management" handled through YARN. This technology is currently in beta testing, and seems to be standing up well.

I think that putting $DSHOME on one node of a HDFS cluster would be self-defeating; it would not get the three-way replication that other artifacts in the HDFS automatically get, so would not be insured against node failure.

Nor do I think it's a good idea to place Data Set storage on HDFS, for similar reasons.

And it's definitely not a good idea to place scratch disk there; scratch disk should always be on local disk if possible.

Posted: Mon Jun 01, 2015 1:04 am
by vmcburney
Agree with Ray - you should wait for the DataStage version that is certified to run on Hadoop. It will have a different license model, it will be much cheaper per processing node, and it will be running natively.

DataStage 11.3.1 has the DataStage Hadoop file stage so if you upgrade to that version you can at least start using it for source and target files.

DS 11.5 certified to run on Hadoop

Posted: Tue Dec 01, 2015 3:27 pm
by pg_smh
Hi,

Just wanted to check if anyone has had a chance to work with the latest Datastage offering DS 11.5 on the Hadoop cluster.

Just trying to assess how easy/difficult will it be to integrate it with an existing hadoop cluster and any references that anyone can share.

Posted: Fri Dec 04, 2015 3:38 pm
by cyclogenisis
I believe what you are looking for is BigIntegrate. It is the BigData version of DataStage using the DataStage engine, running natively on YARN.