小编Pra*_*tel的帖子

无法使用pyspark从json读取数据

我是PySpark的新手.任何人都可以帮助我如何使用pyspark读取json数据.我们所做的,

(1)main.py

import os.path
from pyspark.sql import SparkSession

def fileNameInput(filename,spark):

    try:
        if(os.path.isfile(filename)):
            loadFileIntoHdfs(filename,spark)
        else:
            print("File does not exists")
    except OSError:
        print("Error while finding file")


def loadFileIntoHdfs(fileName,spark):
    df = spark.read.json(fileName)
    df.show()


if __name__ == '__main__':

    spark = SparkSession \
        .builder \
        .appName("Python Spark SQL basic example") \
        .config("spark.some.config.option", "some-value") \
        .getOrCreate()
    file_name = input("Enter file location : ")
    fileNameInput(file_name,spark)
Run Code Online (Sandbox Code Playgroud)

当我运行上面的代码时,它会抛出错误消息

 File "/opt/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/opt/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling …
Run Code Online (Sandbox Code Playgroud)

apache-spark pyspark

6
推荐指数
1
解决办法
5067
查看次数

标签 统计

apache-spark ×1

pyspark ×1