相关疑难解决方法(0)

Python工作者无法连接回来

我是Spark的新手,并尝试完成Spark教程: 链接到教程

在本地机器上安装它(Win10 64,Python 3,Spark 2.4.0)并设置所有env变量(HADOOP_HOME,SPARK_HOME等)后,我试图通过WordCount.py文件运行一个简单的Spark作业:

from pyspark import SparkContext, SparkConf

if __name__ == "__main__":
    conf = SparkConf().setAppName("word count").setMaster("local[2]")
    sc = SparkContext(conf = conf)

    lines = sc.textFile("C:/Users/mjdbr/Documents/BigData/python-spark-tutorial/in/word_count.text")
    words = lines.flatMap(lambda line: line.split(" "))
    wordCounts = words.countByValue()

    for word, count in wordCounts.items():
        print("{} : {}".format(word, count))
Run Code Online (Sandbox Code Playgroud)

从终端运行后:

spark-submit WordCount.py
Run Code Online (Sandbox Code Playgroud)

我得到以下错误.我检查了(通过逐行注释)它崩溃了

wordCounts = words.countByValue()
Run Code Online (Sandbox Code Playgroud)

知道我应该检查什么才能使它工作?

Traceback (most recent call last):
  File "C:\Users\mjdbr\Anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\mjdbr\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Spark\spark-2.4.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py", line 25, in <module> …
Run Code Online (Sandbox Code Playgroud)

python windows local apache-spark pyspark

7
推荐指数
4
解决办法
5905
查看次数

标签 统计

apache-spark ×1

local ×1

pyspark ×1

python ×1

windows ×1