up_*_*irs 5 hive apache-spark pyspark
我正在使用 hadoop 2.10.x + hive 3.1.x + Spark 3.0.1 并尝试通过 pyspark 将日志文件加载到 hive 中。我按照spark文档中的代码连接到hive。
warehouse_location = abspath('spark-warehouse')
spark = SparkSession \
.builder \
.appName("Python Spark SQL Hive integration example") \
.enableHiveSupport() \
.getOrCreate()
Run Code Online (Sandbox Code Playgroud)
但它总是引发 pyspark.sql.utils.IllegalArgumentException: <exception str() failed>。
Traceback (most recent call last):
File "log_extra.py", line 16, in <module>
.appName("Python Spark SQL Hive integration example") \
File "/usr/local/python37/lib/python3.7/site-packages/pyspark/sql/session.py", line 191, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value)
File "/usr/local/python37/lib/python3.7/site-packages/py4j/java_gateway.py", line 1305, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/local/python37/lib/python3.7/site-packages/pyspark/sql/utils.py", line 134, in deco
raise_from(converted)
File "<string>", line 3, in raise_from
pyspark.sql.utils.IllegalArgumentException: <exception str() failed>
Run Code Online (Sandbox Code Playgroud)
如果我不添加 config enableHiveSupport
,这个 python 脚本可以运行,但只能连接到内置配置单元。我已经放入hive-site.xml
了spark/conf. 现在我不知道如何通过火花连接到我的蜂巢。
归档时间: |
|
查看次数: |
781 次 |
最近记录: |