okw*_*wap 5 python apache-spark apache-spark-sql pyspark
我可以在创建表后立即读取它,但是如何在另一个Spark会话中再次读取它呢?
给定代码:
spark = SparkSession \
.builder \
.getOrCreate()
df = spark.read.parquet("examples/src/main/resources/users.parquet")
(df
.write
.saveAsTable("people_partitioned_bucketed"))
# retrieve rows from table as expected
spark.sql("select * from people_partitioned_bucketed").show()
spark.stop()
# open spark session again
spark = SparkSession \
.builder \
.getOrCreate()
# table not exist this time
spark.sql("select * from people_partitioned_bucketed").show()
```
Run Code Online (Sandbox Code Playgroud)
执行结果:
+------+----------------+--------------+
| name|favorite_numbers|favorite_color|
+------+----------------+--------------+
|Alyssa| [3, 9, 15, 20]| null|
| Ben| []| red|
+------+----------------+--------------+
Traceback (most recent call last):
File "/home//workspace/spark/examples/src/main/python/sql/datasource.py", line 246, in <module>
spark.sql("select * from people_partitioned_bucketed").show()
File "/home//virtualenvs/spark/local/lib/python2.7/site-packages/pyspark/sql/session.py", line 603, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/home//virtualenvs/spark/local/lib/python2.7/site-packages/py4j/java_gateway.py", line 1133, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/home//virtualenvs/spark/local/lib/python2.7/site-packages/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u'Table or view not found: people_partitioned_bucketed; line 1 pos 14'
Run Code Online (Sandbox Code Playgroud)
查看文档:
对于基于文件的数据源,例如text、parquet、json等,您可以通过path选项指定自定义表路径,例如df.write.option("path", "/some/path").saveAsTable(" t”)。当表被删除时,自定义表路径不会被删除,表数据仍然存在。如果不指定自定义表路径,Spark会将数据写入仓库目录下的默认表路径。当表被删除时,默认的表路径也将被删除。
也就是说,使用.txt保存表时需要指定路径path()。如果未指定路径,则关闭 Spark 会话时该表将被删除。
| 归档时间: |
|
| 查看次数: |
598 次 |
| 最近记录: |