Writing to Hadoop using Datastage 11.5.2 Parallel Hive
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 7
- Joined: Mon Mar 29, 2010 1:36 pm
- Location: WASHINGTON, DC
Two questions:
1) Does your site use Kerberos for security? If so, I might be able to help (I have only worked at Kerberos sites).
2) Have you or your admin setup the config file for Hive?
https://www.ibm.com/support/knowledgece ... river.html
The file should look like this (but with correct installation location for your site) assuming you are using the Hive library that comes with BigIntegrate:
$ pwd
/InformationServer/Server/DSEngine
$ cat isjdbc.config
CLASSPATH=/InformationServer/ASBNode/lib/java/IShive.jar
CLASS_NAMES=com.ibm.isf.jdbc.hive.HiveDriver
Answer those two questions first... Then I might be able to assist with getting the stage working (assuming Kerberos is used).
1) Does your site use Kerberos for security? If so, I might be able to help (I have only worked at Kerberos sites).
2) Have you or your admin setup the config file for Hive?
https://www.ibm.com/support/knowledgece ... river.html
The file should look like this (but with correct installation location for your site) assuming you are using the Hive library that comes with BigIntegrate:
$ pwd
/InformationServer/Server/DSEngine
$ cat isjdbc.config
CLASSPATH=/InformationServer/ASBNode/lib/java/IShive.jar
CLASS_NAMES=com.ibm.isf.jdbc.hive.HiveDriver
Answer those two questions first... Then I might be able to assist with getting the stage working (assuming Kerberos is used).
-
- Participant
- Posts: 103
- Joined: Tue Jan 06, 2015 4:30 am
Andy, can i ask you a question, are you able to connect to Hive Server2 using ISHive.jar that is shipped with IIS itself?
I also have a kerberos setup. I get this error.
java.sql.SQLException: [IBM][Hive JDBC Driver]THRIFT protocol error.
This is the jdbc URL is use:
Here is the JDBCDriverLogin.conf that is i created in the same folder as ISHive.jar
The above does not work with the Hive connector for me. Although beeline client does work. I am running this with dsadm
And i am absolutely clueless what i do wrong here.....
I also have a kerberos setup. I get this error.
java.sql.SQLException: [IBM][Hive JDBC Driver]THRIFT protocol error.
This is the jdbc URL is use:
Code: Select all
jdbc:ibm:hive://hivehostname:2181;DataBaseName=test;AuthenticationMethod=kerberos;ServicePrincipalName=datastage@XXX.NET;loginConfigName=JDBC_DRIVER_dsadm_keytab
Code: Select all
JDBC_DRIVER_dsadm_keytab {
com.ibm.security.auth.module.Krb5LoginModule required
credsType=both
principal="datastage@XXX.NET"
useKeytab="FILE:/etc/security/keytabs/datastage.hdfs.headless.keytab";
};
JDBC_DRIVER_cache{
com.ibm.security.auth.module.Krb5LoginModule required
credsType=initiator
principal="datastage@XXX.NET"
useCcache="FILE:/tmp/krb5cc_22367";
};
The above does not work with the Hive connector for me. Although beeline client does work. I am running this with dsadm
Code: Select all
kinit -kt /etc/security/keytabs/datastage.hdfs.headless.keytab datastage
beeline --verbose=true -u "jdbc:hive2://hivehostname:2181/hive;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"
Yes we connect to Hive Server2, but our URL looks slightly different...
jdbc:ibm:hive://hiveserver.company.com:10000;AuthenticationMethod=kerberos;ServicePrincipalName=hive/hiveserver.company.com@KERBEROS_DEFAULT_REALM;loginConfigName=JDBC_DRIVER_USERID
Look in /etc/krb5.conf under libdefaults for the Kerberos default realm. Also - the USERID seems to be case sensitive and should be upper case.
I also recommend debugging in "non-YARN mode" because it makes it simpler. To do that use an APT file that only runs nodes on the Edge Node (no Dynamic Hosts). Also set your APT_YARN_CONFIG to the empty string and APT_YARN_MODE to 0 (zero) in your job. That should make it run edge-node only.
Hopefully that will get your connection up and running on the edge node. If that works, then there are probably other steps that have to be done to get the keytab dispersed to the data nodes for "YARN mode".
jdbc:ibm:hive://hiveserver.company.com:10000;AuthenticationMethod=kerberos;ServicePrincipalName=hive/hiveserver.company.com@KERBEROS_DEFAULT_REALM;loginConfigName=JDBC_DRIVER_USERID
Look in /etc/krb5.conf under libdefaults for the Kerberos default realm. Also - the USERID seems to be case sensitive and should be upper case.
I also recommend debugging in "non-YARN mode" because it makes it simpler. To do that use an APT file that only runs nodes on the Edge Node (no Dynamic Hosts). Also set your APT_YARN_CONFIG to the empty string and APT_YARN_MODE to 0 (zero) in your job. That should make it run edge-node only.
Hopefully that will get your connection up and running on the edge node. If that works, then there are probably other steps that have to be done to get the keytab dispersed to the data nodes for "YARN mode".
-
- Participant
- Posts: 7
- Joined: Mon Mar 29, 2010 1:36 pm
- Location: WASHINGTON, DC
Hi -
Sorry for the 6 months delay but we abandoned this project and restarted working on it now. So, at this stage the update is we were able to successfully establish a connection using Hive Connector in the sense that the "TEST" connection returns a successful message. Then we built a simple test job to read a few records from a flat file and write to a target table in Impala using Hive Connector. The job does not return any error messages but we find that it does not load any rows into the target table either. What could be going on? Again, we are on v11.5 Parallel datastage running on a linux box.
Sorry for the 6 months delay but we abandoned this project and restarted working on it now. So, at this stage the update is we were able to successfully establish a connection using Hive Connector in the sense that the "TEST" connection returns a successful message. Then we built a simple test job to read a few records from a flat file and write to a target table in Impala using Hive Connector. The job does not return any error messages but we find that it does not load any rows into the target table either. What could be going on? Again, we are on v11.5 Parallel datastage running on a linux box.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: Writing to Hadoop using Datastage 11.5.2 Parallel Hive
I am using format :
jdbc:ibm:hive://hiveserver.company.com:10000;AuthenticationMethod=kerberos;ServicePrincipalName=hive/hiveserver.company.com@KERBEROS_DEFAULT_REALM;loginConfigName=JDBC_DRIVER_USERID
in JDBC connector and able to get through for limited records and using default queue. Is there any way to set queue in above connection string?
jdbc:ibm:hive://hiveserver.company.com:10000;AuthenticationMethod=kerberos;ServicePrincipalName=hive/hiveserver.company.com@KERBEROS_DEFAULT_REALM;loginConfigName=JDBC_DRIVER_USERID
in JDBC connector and able to get through for limited records and using default queue. Is there any way to set queue in above connection string?