response = "mi_or_chd_5"
outcome = sqlc.sql("""select eid,{response} as response
from outcomes
where {response} IS NOT NULL""".format(response=response))
outcome.write.parquet(response, mode="overwrite") # Success
print outcome.schema
StructType(List(StructField(eid,IntegerType,true),StructField(response,ShortType,true)))
Run Code Online (Sandbox Code Playgroud)
但是之后:
outcome2 = sqlc.read.parquet(response) # fail
Run Code Online (Sandbox Code Playgroud)
失败了:
AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'
Run Code Online (Sandbox Code Playgroud)
在
/usr/local/lib/python2.7/dist-packages/pyspark-2.1.0+hadoop2.7-py2.7.egg/pyspark/sql/utils.pyc in deco(*a, **kw)
Run Code Online (Sandbox Code Playgroud)
镶木地板的文档说格式是自我描述的,并且在保存镶木地板文件时可以使用完整的模式.是什么赋予了?
使用Spark 2.1.1.在2.2.0中也失败了.
发现此错误报告,但已在2.0.1,2.1.0中修复.
更新:当与master ="local"连接时,此工作,当连接到master ="mysparkcluster"时失败.