I have recently developed a job to connect to the Hive database in the Hadoop Ecosystem. Now I have used two methods to connect to the Hive database which are :-
1. ODBC Connector
2. Hive Connector
However , I am facing massive performance issues with the hive connector stage . Its taking hours to simply load some 80k rows where as when I use the ODBC connector stage , the performance is very good. We see this getting loaded in around 5 minutes time.
Does any one have an idea on this. Ideally the native connector stage should be faster and should have more options but in my case , the performance is really bad ...
I have just got some patches to be installed on my services / engine tier as suggested by IBM . This may improve the speed. Will do that and then revert back with my findings