CREATE Hive TABLE (AS SELECT) 需要 Hive 支持

A.H*_*DAD 3 hiveql pyspark jupyter-notebook

我计划将 Spark 数据帧保存到配置单元表中,以便我可以查询它们并从中提取纬度和经度,因为 Spark 数据帧不可迭代。

\n\n

使用 jupyter 中的 pyspark,我编写了以下代码来进行 Spark 会话:

\n\n
import findspark\nfindspark.init()\nfrom pyspark import SparkContext, SparkConf\nfrom pyspark.sql import SparkSession\n\n#readmultiple csv with pyspark\n spark = SparkSession \\\n.builder \\\n.appName("Python Spark SQL basic example") \\\n.config("spark.sql.catalogImplementation=hive").enableHiveSupport() \\\n.getOrCreate()\n\n df = spark.read.csv("Desktop/train/train.csv",header=True);\n\n Pickup_locations=df.select("pickup_datetime","Pickup_latitude",\n                          "Pickup_longitude")\n\n print(Pickup_locations.count())\n
Run Code Online (Sandbox Code Playgroud)\n\n

然后我运行 hiveql :

\n\n
df.createOrReplaceTempView("mytempTable") \nspark.sql("create table hive_table as select * from mytempTable");\n
Run Code Online (Sandbox Code Playgroud)\n\n

我收到这个错误:

\n\n
 Py4JJavaError: An error occurred while calling o24.sql.\n : org.apache.spark.sql.AnalysisException: Hive support is required to      CREATE Hive TABLE (AS SELECT);;\n \'CreateTable `hive_table`,    org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, ErrorIfExists\n+- Project [id#311, vendor_id#312, pickup_datetime#313,    dropoff_datetime#314, passenger_count#315, pickup_longitude#316,     pickup_latitude#317, dropoff_longitude#318, dropoff_latitude#319,     store_and_fwd_flag#320, trip_duration#321]\xe2\x80\x8b\n
Run Code Online (Sandbox Code Playgroud)\n\n

\xe2\x80\x8b

\n

Abd*_*awi 7

我以前也遇到过这种情况。您需要将配置参数传递给spark-submit命令,以便它将hive视为spark sql的目录实现。

Spark 提交的样子如下:

spark-submit --deploy-mode cluster --master yarn --conf spark.sql.catalogImplementation=hive  --class harri_sparkStreaming.com_spark_streaming.App  ./target/com-spark-streaming-2.3.0-jar-with-dependencies.jar
Run Code Online (Sandbox Code Playgroud)

诀窍在于:--conf spark.sql.catalogImplementation=hive

希望这可以帮助