为什么pyspark失败了"无法找到连接到Metastore的hive jar.请设置spark.sql.hive.metastore.jars."?

nav*_*dul 8 apache-spark pyspark

我正在使用带有两个节点的apache spark版本2.0.0的独立集群,我还没有安装hive.我在创建数据帧时遇到以下错误.

from pyspark import SparkContext
from pyspark import SQLContext
sqlContext = SQLContext(sc)
l = [('Alice', 1)]
sqlContext.createDataFrame(l).collect()
---------------------------------------------------------------------------
IllegalArgumentException                  Traceback (most recent call last)
<ipython-input-9-63bc4f21f23e> in <module>()
----> 1 sqlContext.createDataFrame(l).collect()

/home/mok/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/context.pyc in createDataFrame(self, data, schema, samplingRatio)
    297         Py4JJavaError: ...
    298         """
--> 299         return self.sparkSession.createDataFrame(data, schema, samplingRatio)
    300 
    301     @since(1.3)

/home/mok/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/session.pyc in createDataFrame(self, data, schema, samplingRatio)
    522             rdd, schema = self._createFromLocal(map(prepare, data), schema)
    523         jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
--> 524         jdf = self._jsparkSession.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
    525         df = DataFrame(jdf, self._wrapped)
    526         df._schema = schema

/home/mok/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    931         answer = self.gateway_client.send_command(command)
    932         return_value = get_return_value(
--> 933             answer, self.gateway_client, self.target_id, self.name)
    934 
    935         for temp_arg in temp_args:

/home/mok/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     77                 raise QueryExecutionException(s.split(': ', 1)[1], stackTrace)
     78             if s.startswith('java.lang.IllegalArgumentException: '):
---> 79                 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
     80             raise
     81     return deco

IllegalArgumentException: u'Unable to locate hive jars to connect to metastore. Please set spark.sql.hive.metastore.jars.'
Run Code Online (Sandbox Code Playgroud)

我应该安装Hive还是编辑配置.

小智 11

IllegalArgumentException:u'无法找到连接到Metastore的hive jar.请设置spark.sql.hive.metastore.jars.'

我有同样的问题并使用Java 8修复它.确保安装JDK 8并相应地设置环境变量.

不要将Java 11与Spark/pyspark 2.4一起使用.


jer*_*man 6

如果您有多个Java版本,则必须弄清楚正在使用哪个spark(我使用trial and error做到了,从

JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64"
Run Code Online (Sandbox Code Playgroud)

并以

JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
Run Code Online (Sandbox Code Playgroud)