Page 1 of 1

Hadoop and Datastage EE

Posted: Thu Nov 04, 2010 7:00 am
by daignault
Hadoop - http://en.wikipedia.org/wiki/Hadoop


I'm looking at playing with the Hadoop API and creating a buildop for write operations. Anyone out there worked with Hadoop.

I've created buildops before so I'm fine on that side of the ledger. I suspect that with hadoop I'll need to pay attention to partitioning on the outbound connector.

Just to clarify, at the present time I'm only interested in playing with updating the HDFS file system. Not playing with Map reduce, etc.

Thanks for any insite.

Regards,

Ray D

Posted: Thu Nov 04, 2010 9:48 am
by ray.wurlod
Stay tuned to IBM for much more coming around Hadoop and processing unstructured data generally, particularly using InfoSphere Streams. One of the two big themes for the coming year for Big Blue is handling the 80% of data that are unstructured.

Hadoop is now part of V8.7

Posted: Sat May 26, 2012 6:41 pm
by akrish1982
Hadoop integration is now a part of IIS suite. Hadoop is very important in large scale computing today.

http://datastageetlexpert.blogspot.com/ ... adoop.html

Posted: Sat May 26, 2012 11:52 pm
by ray.wurlod
Indeed. As noted in a different thread right here on DSXchange, there is a Big Data stage available for version 8.7, which is essentially a Sequential File stage that connects to a Hadoop file system. So you get all the benefits of the STREAMS I/O module still, with straightforward access to Hadoop.

Posted: Sun May 27, 2012 1:15 pm
by PaulVL
Mr Daignault's company is bogged down with Red Tape and only has 8.1 and 8.5 installed in an ETL environment.

Having an 8.7 one will be another year away if that.

Posted: Sun May 27, 2012 5:15 pm
by ray.wurlod
Meh.

Try working with Defence and Taxation authorities. I'm doing both at the moment. MUCH waiting on bureaucratic processes and even then what they give you may not be what you asked for.

Don't get me started.