我一直在尝试使用sqlContext.read.format("jdbc").options(driver="org.apache.hive.jdbc.HiveDriver")Hive表进入Spark而没有任何成功.我做过研究并阅读如下:
Spark 1.5.1无法使用hive jdbc 1.2.0
http://belablotski.blogspot.in/2016/01/access-hive-tables-from-spark-using.html
我使用了最新的Hortonworks Sandbox 2.6并向社区询问了同样的问题:
我想做的事情非常简单pyspark:
df = sqlContext.read.format("jdbc").options(driver="org.apache.hive.jdbc.HiveDriver", url="jdbc:hive2://localhost:10016/default", dbtable="sample_07",user="maria_dev", password="maria_dev").load()
Run Code Online (Sandbox Code Playgroud)
这给了我这个错误:
17/12/30 19:55:14 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10016/default
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/hdp/current/spark-client/python/pyspark/sql/readwriter.py", line 139, in load
return self._df(self._jreader.load())
File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/usr/hdp/current/spark-client/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error …Run Code Online (Sandbox Code Playgroud)