小编tan*_*ndy的帖子

在IntelliJ IDEA中编写并运行pyspark

我正在尝试与IntelliJ中的Pyspark合作,但我无法弄清楚如何正确安装它/设置项目.我可以在IntelliJ中使用Python,我可以使用pyspark shell,但我无法告诉IntelliJ如何找到Spark文件(导入pyspark结果为"ImportError:No module named pyspark").

关于如何包含/导入spark以便IntelliJ可以使用它的任何提示都值得赞赏.

谢谢.

更新:

我尝试了这段代码:

from pyspark import SparkContext, SparkConf
spark_conf = SparkConf().setAppName("scavenge some logs")
spark_context = SparkContext(conf=spark_conf)
address = "C:\test.txt"
log = spark_context.textFile(address)

my_result = log.filter(lambda x: 'foo' in x).saveAsTextFile('C:\my_result')
Run Code Online (Sandbox Code Playgroud)

以下错误消息:

Traceback (most recent call last):
File "C:/Users/U546816/IdeaProjects/sparktestC/.idea/sparktestfile", line 2, in <module>
spark_conf = SparkConf().setAppName("scavenge some logs")
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\conf.py", line 97, in __init__
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\context.py", line 221, in _ensure_initialized
File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\java_gateway.py", line 35, in launch_gateway

File "C:\Python27\lib\os.py", line 425, in __getitem__
return self.data[key.upper()]
KeyError: 'SPARK_HOME' …
Run Code Online (Sandbox Code Playgroud)

python intellij-idea apache-spark pyspark

7
推荐指数
1
解决办法
1万
查看次数

标签 统计

apache-spark ×1

intellij-idea ×1

pyspark ×1

python ×1