Page 1 of 1

Access Big Data, AVRO, and ORC from File Connector

Posted: Fri Oct 14, 2016 6:02 am
by deesh
Hi,

Any one worked with the Big Data, AVRO, ORC, Sequence through FILE connector stage. IF Yes tell me how to develop and required installations.

Posted: Fri Oct 14, 2016 8:30 am
by asorrell
IBM does not support accessing Big Data files from version 8.0. You must upgrade to version 11.5. Then the documentation below on the Big Data File Stage will be relevant:

http://www.ibm.com/support/knowledgecen ... Stage.html

You may also need to add this patch to add AVRO / ORC support to the File Connector:
http://www-01.ibm.com/support/docview.w ... wg24041535

Vik has some nice documentation on using ORC and AVRO at 11.5:
https://www.linkedin.com/pulse/avro-orc ... r-malhotra

Your only alternative for AVRO at release 8.0 is to build a Java Interface of some sort for it, since AVRO is an API built for JSON. I found something on DeveloperWorks for it:

DataStage Java Pack sample: This sample shows you how to use Avro API to read and write a Avro data file with Avro schema in a Java Transformer user class
https://www.ibm.com/developerworks/comm ... 95a7d64a2e.

That might work at 8.0. If you upgrade to a newer release and still want an API interface, you should look at the Java Integration stage, it might be easier.

No idea about how to read ORC at release 8.0, don't think it is possible.

These are relatively new technologies and you'll need a newer version of DataStage to access them easily.