相关疑难解决方法(0)

我如何阅读从Spark编写的PySpark中的镶木地板？

我在分析中使用两个Jupyter笔记本来做不同的事情.在我的Scala笔记本中,我将一些已清理的数据写入镶木地板:

partitionedDF.select("noStopWords","lowerText","prediction").write.save("swift2d://xxxx.keystone/commentClusters.parquet")

Run Code Online (Sandbox Code Playgroud)

然后我去我的Python笔记本读取数据:

df = spark.read.load("swift2d://xxxx.keystone/commentClusters.parquet")

Run Code Online (Sandbox Code Playgroud)

我收到以下错误:

AnalysisException: u'Unable to infer schema for ParquetFormat at swift2d://RedditTextAnalysis.keystone/commentClusters.parquet. It must be specified manually;'

Run Code Online (Sandbox Code Playgroud)

我查看了spark文档,我认为不应该要求我指定一个模式.有没有人碰到这样的事情？我保存/加载时应该做些什么吗？数据将在对象存储中登陆.

编辑:我在读取和写入时都会唱出spark 2.0.

edit2:这是在Data Science Experience的一个项目中完成的.

python scala apache-spark pyspark data-science-experience

Ros*_*wis

2019 04-12

25
推荐指数

2
解决办法

4万
查看次数

标签统计

apache-spark ×1

data-science-experience ×1

pyspark ×1

python ×1

scala ×1

我如何阅读从Spark编写的PySpark中的镶木地板？

标签 统计

标签统计