Access Big Data, AVRO, and ORC from File Connector

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
deesh
Participant
Posts: 193
Joined: Mon Oct 08, 2007 2:57 am

Access Big Data, AVRO, and ORC from File Connector

Post by deesh »

Hi,

Any one worked with the Big Data, AVRO, ORC, Sequence through FILE connector stage. IF Yes tell me how to develop and required installations.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

IBM does not support accessing Big Data files from version 8.0. You must upgrade to version 11.5. Then the documentation below on the Big Data File Stage will be relevant:

http://www.ibm.com/support/knowledgecen ... Stage.html

You may also need to add this patch to add AVRO / ORC support to the File Connector:
http://www-01.ibm.com/support/docview.w ... wg24041535

Vik has some nice documentation on using ORC and AVRO at 11.5:
https://www.linkedin.com/pulse/avro-orc ... r-malhotra

Your only alternative for AVRO at release 8.0 is to build a Java Interface of some sort for it, since AVRO is an API built for JSON. I found something on DeveloperWorks for it:

DataStage Java Pack sample: This sample shows you how to use Avro API to read and write a Avro data file with Avro schema in a Java Transformer user class
https://www.ibm.com/developerworks/comm ... 95a7d64a2e.

That might work at 8.0. If you upgrade to a newer release and still want an API interface, you should look at the Java Integration stage, it might be easier.

No idea about how to read ORC at release 8.0, don't think it is possible.

These are relatively new technologies and you'll need a newer version of DataStage to access them easily.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply