Datastage BDFS access to a hadoop environment (HDP v2.5)

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
wpakkala
Participant
Posts: 7
Joined: Sun Jul 17, 2011 7:13 am

Datastage BDFS access to a hadoop environment (HDP v2.5)

Post by wpakkala »

I am trying to connect my Datastage environment (job) to a Horton Works sandbox v2.5 (runs on Centos 7).

I have both IIS 11.5 operating properly and the HDP environment is running fine.

I have followed the installation instructions for setting up hadoop in IIS ... /etc/hosts is configured, copied libraries, etc. from HDP platform to the IIS platform, placed them in a separate folder, and configured the ishdfs.confg file.

I know I am missing something but cannot determine what it is....

The datastage job is simple BDFS stage to a peek

Here is the values in BDFS:
filename=/hadoop/hdfs/data/user/wayne/trucks.csv
host=sandbox
port number=8020

and the error I receive when I attempt to view data is:
IIS-DSEE-TUTL-00031 16:53:56(002) <main_program> The open files limit is 1024; raising to 4096.
##I IIS-DSEE-TFCN-00006 16:53:56(003) <main_program> conductor uname: -s=Linux; -r=2.6.32-642.4.2.el6.x86_64; -v=#1 SMP Mon Aug 15 02:06:41 EDT 2016; -n=venus.diamond; -m=x86_64
##I IIS-DSEE-TOSH-00002 16:53:56(004) <main_program> orchgeneral: loaded
##I IIS-DSEE-TOSH-00002 16:53:56(005) <main_program> orchsort: loaded
##I IIS-DSEE-TOSH-00002 16:53:56(006) <main_program> orchstats: loaded
##W IIS-DSEE-TOSH-00049 16:53:56(007) <main_program> Parameter specified but not used in flow: DSProjectMapName
##I IIS-DSEE-TOIX-00275 16:53:56(008) <main_program> Using HDFS connection type: libhdfs.so.
##I IIS-DSEE-TFSC-00001 16:53:56(009) <main_program> APT configuration file: /opt/IBM/InformationServer/Server/Configurations/default.apt
##W IIS-DSEE-USBP-00002 16:53:56(000) <Big_Data_File_0,0> log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
##W IIS-DSEE-USBP-00002 16:53:56(001) <Big_Data_File_0,0> log4j:WARN Please initialize the log4j system properly.
##W IIS-DSEE-USBP-00002 16:53:56(002) <Big_Data_File_0,0> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
##W IIS-DSEE-USBP-00002 16:53:57(000) <Big_Data_File_0,0> hdfsBuilderConnect(forceNewInstance=0, nn=sandbox, port=8020, kerbTicketCachePath=(NULL), userName=wayne) error:
##W IIS-DSEE-USBP-00002 16:53:57(001) <Big_Data_File_0,0> java.io.IOException: No FileSystem for scheme: hdfs
##W IIS-DSEE-USBP-00002 16:53:57(002) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2752)
##W IIS-DSEE-USBP-00002 16:53:57(003) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2759)
##W IIS-DSEE-USBP-00002 16:53:57(004) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
##W IIS-DSEE-USBP-00002 16:53:57(005) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2795)
##W IIS-DSEE-USBP-00002 16:53:57(006) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2777)
##W IIS-DSEE-USBP-00002 16:53:57(007) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
##W IIS-DSEE-USBP-00002 16:53:57(008) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:169)
##W IIS-DSEE-USBP-00002 16:53:57(009) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:166)
##W IIS-DSEE-USBP-00002 16:53:57(010) <Big_Data_File_0,0> at java.security.AccessController.doPrivileged(AccessController.java:366)
##W IIS-DSEE-USBP-00002 16:53:57(011) <Big_Data_File_0,0> at javax.security.auth.Subject.doAs(Subject.java:572)
##W IIS-DSEE-USBP-00002 16:53:57(012) <Big_Data_File_0,0> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
##W IIS-DSEE-USBP-00002 16:53:57(013) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166)
>##E IIS-DSEE-TFPA-00038 16:53:57(000) <Big_Data_File_0,0> Unable to connect to hdfs host sandbox on port 8020: Unknown error 255.


The error message within here that concerns me is:
java.io.IOException: No FileSystem for scheme: hdfs

Any guidance is appreciated.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

I haven't been down this path before, but just wanted to point out that the error line(s) in these types of problems are usually the ones to be concerned about. They will be the ones with ##E at the start of the line.

See if the IBM articles in this Google search are of any help:
https://www.google.com/?gws_rd=ssl#q=Un ... +error+255

Mike
wpakkala
Participant
Posts: 7
Joined: Sun Jul 17, 2011 7:13 am

Re: Datastage BDFS access to a hadoop environment (HDP v2.5)

Post by wpakkala »

To all

I have resolved the problem. Turns out there are numerous libraries and files to be copied to the IIS server (as per the documentation) for configuring IIS to support hadoop.

Having re-done that work, I can connect to the files...The name I was using was also incorrect.

Now, I have file processing problems due to the file format (Windows vs Unix) but the connectivity issues are resolved.

Wayne
Post Reply