Datastage BDFS access to a hadoop environment (HDP v2.5)
Posted: Wed Sep 28, 2016 2:55 pm
I am trying to connect my Datastage environment (job) to a Horton Works sandbox v2.5 (runs on Centos 7).
I have both IIS 11.5 operating properly and the HDP environment is running fine.
I have followed the installation instructions for setting up hadoop in IIS ... /etc/hosts is configured, copied libraries, etc. from HDP platform to the IIS platform, placed them in a separate folder, and configured the ishdfs.confg file.
I know I am missing something but cannot determine what it is....
The datastage job is simple BDFS stage to a peek
Here is the values in BDFS:
filename=/hadoop/hdfs/data/user/wayne/trucks.csv
host=sandbox
port number=8020
and the error I receive when I attempt to view data is:
IIS-DSEE-TUTL-00031 16:53:56(002) <main_program> The open files limit is 1024; raising to 4096.
##I IIS-DSEE-TFCN-00006 16:53:56(003) <main_program> conductor uname: -s=Linux; -r=2.6.32-642.4.2.el6.x86_64; -v=#1 SMP Mon Aug 15 02:06:41 EDT 2016; -n=venus.diamond; -m=x86_64
##I IIS-DSEE-TOSH-00002 16:53:56(004) <main_program> orchgeneral: loaded
##I IIS-DSEE-TOSH-00002 16:53:56(005) <main_program> orchsort: loaded
##I IIS-DSEE-TOSH-00002 16:53:56(006) <main_program> orchstats: loaded
##W IIS-DSEE-TOSH-00049 16:53:56(007) <main_program> Parameter specified but not used in flow: DSProjectMapName
##I IIS-DSEE-TOIX-00275 16:53:56(008) <main_program> Using HDFS connection type: libhdfs.so.
##I IIS-DSEE-TFSC-00001 16:53:56(009) <main_program> APT configuration file: /opt/IBM/InformationServer/Server/Configurations/default.apt
##W IIS-DSEE-USBP-00002 16:53:56(000) <Big_Data_File_0,0> log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
##W IIS-DSEE-USBP-00002 16:53:56(001) <Big_Data_File_0,0> log4j:WARN Please initialize the log4j system properly.
##W IIS-DSEE-USBP-00002 16:53:56(002) <Big_Data_File_0,0> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
##W IIS-DSEE-USBP-00002 16:53:57(000) <Big_Data_File_0,0> hdfsBuilderConnect(forceNewInstance=0, nn=sandbox, port=8020, kerbTicketCachePath=(NULL), userName=wayne) error:
##W IIS-DSEE-USBP-00002 16:53:57(001) <Big_Data_File_0,0> java.io.IOException: No FileSystem for scheme: hdfs
##W IIS-DSEE-USBP-00002 16:53:57(002) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2752)
##W IIS-DSEE-USBP-00002 16:53:57(003) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2759)
##W IIS-DSEE-USBP-00002 16:53:57(004) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
##W IIS-DSEE-USBP-00002 16:53:57(005) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2795)
##W IIS-DSEE-USBP-00002 16:53:57(006) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2777)
##W IIS-DSEE-USBP-00002 16:53:57(007) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
##W IIS-DSEE-USBP-00002 16:53:57(008) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:169)
##W IIS-DSEE-USBP-00002 16:53:57(009) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:166)
##W IIS-DSEE-USBP-00002 16:53:57(010) <Big_Data_File_0,0> at java.security.AccessController.doPrivileged(AccessController.java:366)
##W IIS-DSEE-USBP-00002 16:53:57(011) <Big_Data_File_0,0> at javax.security.auth.Subject.doAs(Subject.java:572)
##W IIS-DSEE-USBP-00002 16:53:57(012) <Big_Data_File_0,0> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
##W IIS-DSEE-USBP-00002 16:53:57(013) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166)
>##E IIS-DSEE-TFPA-00038 16:53:57(000) <Big_Data_File_0,0> Unable to connect to hdfs host sandbox on port 8020: Unknown error 255.
The error message within here that concerns me is:
java.io.IOException: No FileSystem for scheme: hdfs
Any guidance is appreciated.
I have both IIS 11.5 operating properly and the HDP environment is running fine.
I have followed the installation instructions for setting up hadoop in IIS ... /etc/hosts is configured, copied libraries, etc. from HDP platform to the IIS platform, placed them in a separate folder, and configured the ishdfs.confg file.
I know I am missing something but cannot determine what it is....
The datastage job is simple BDFS stage to a peek
Here is the values in BDFS:
filename=/hadoop/hdfs/data/user/wayne/trucks.csv
host=sandbox
port number=8020
and the error I receive when I attempt to view data is:
IIS-DSEE-TUTL-00031 16:53:56(002) <main_program> The open files limit is 1024; raising to 4096.
##I IIS-DSEE-TFCN-00006 16:53:56(003) <main_program> conductor uname: -s=Linux; -r=2.6.32-642.4.2.el6.x86_64; -v=#1 SMP Mon Aug 15 02:06:41 EDT 2016; -n=venus.diamond; -m=x86_64
##I IIS-DSEE-TOSH-00002 16:53:56(004) <main_program> orchgeneral: loaded
##I IIS-DSEE-TOSH-00002 16:53:56(005) <main_program> orchsort: loaded
##I IIS-DSEE-TOSH-00002 16:53:56(006) <main_program> orchstats: loaded
##W IIS-DSEE-TOSH-00049 16:53:56(007) <main_program> Parameter specified but not used in flow: DSProjectMapName
##I IIS-DSEE-TOIX-00275 16:53:56(008) <main_program> Using HDFS connection type: libhdfs.so.
##I IIS-DSEE-TFSC-00001 16:53:56(009) <main_program> APT configuration file: /opt/IBM/InformationServer/Server/Configurations/default.apt
##W IIS-DSEE-USBP-00002 16:53:56(000) <Big_Data_File_0,0> log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
##W IIS-DSEE-USBP-00002 16:53:56(001) <Big_Data_File_0,0> log4j:WARN Please initialize the log4j system properly.
##W IIS-DSEE-USBP-00002 16:53:56(002) <Big_Data_File_0,0> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
##W IIS-DSEE-USBP-00002 16:53:57(000) <Big_Data_File_0,0> hdfsBuilderConnect(forceNewInstance=0, nn=sandbox, port=8020, kerbTicketCachePath=(NULL), userName=wayne) error:
##W IIS-DSEE-USBP-00002 16:53:57(001) <Big_Data_File_0,0> java.io.IOException: No FileSystem for scheme: hdfs
##W IIS-DSEE-USBP-00002 16:53:57(002) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2752)
##W IIS-DSEE-USBP-00002 16:53:57(003) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2759)
##W IIS-DSEE-USBP-00002 16:53:57(004) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
##W IIS-DSEE-USBP-00002 16:53:57(005) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2795)
##W IIS-DSEE-USBP-00002 16:53:57(006) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2777)
##W IIS-DSEE-USBP-00002 16:53:57(007) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
##W IIS-DSEE-USBP-00002 16:53:57(008) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:169)
##W IIS-DSEE-USBP-00002 16:53:57(009) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:166)
##W IIS-DSEE-USBP-00002 16:53:57(010) <Big_Data_File_0,0> at java.security.AccessController.doPrivileged(AccessController.java:366)
##W IIS-DSEE-USBP-00002 16:53:57(011) <Big_Data_File_0,0> at javax.security.auth.Subject.doAs(Subject.java:572)
##W IIS-DSEE-USBP-00002 16:53:57(012) <Big_Data_File_0,0> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
##W IIS-DSEE-USBP-00002 16:53:57(013) <Big_Data_File_0,0> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166)
>##E IIS-DSEE-TFPA-00038 16:53:57(000) <Big_Data_File_0,0> Unable to connect to hdfs host sandbox on port 8020: Unknown error 255.
The error message within here that concerns me is:
java.io.IOException: No FileSystem for scheme: hdfs
Any guidance is appreciated.