我是PySpark的新手.任何人都可以帮助我如何使用pyspark读取json数据.我们所做的,
(1)main.py
import os.path
from pyspark.sql import SparkSession
def fileNameInput(filename,spark):
try:
if(os.path.isfile(filename)):
loadFileIntoHdfs(filename,spark)
else:
print("File does not exists")
except OSError:
print("Error while finding file")
def loadFileIntoHdfs(fileName,spark):
df = spark.read.json(fileName)
df.show()
if __name__ == '__main__':
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
file_name = input("Enter file location : ")
fileNameInput(file_name,spark)
Run Code Online (Sandbox Code Playgroud)
当我运行上面的代码时,它会抛出错误消息
File "/opt/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling …Run Code Online (Sandbox Code Playgroud)