相关疑难解决方法(0)

Pyspark --py文件不起作用

我使用此文件建议http://spark.apache.org/docs/1.1.1/submitting-applications.html

spsark版本1.1.0

./spark/bin/spark-submit --py-files /home/hadoop/loganalysis/parser-src.zip \
/home/hadoop/loganalysis/ship-test.py 
Run Code Online (Sandbox Code Playgroud)

和代码中的conf:

conf = (SparkConf()
        .setMaster("yarn-client")
        .setAppName("LogAnalysis")
        .set("spark.executor.memory", "1g")
        .set("spark.executor.cores", "4")
        .set("spark.executor.num", "2")
        .set("spark.driver.memory", "4g")
        .set("spark.kryoserializer.buffer.mb", "128"))
Run Code Online (Sandbox Code Playgroud)

和slave节点抱怨ImportError

14/12/25 05:09:53 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-172-31-10-8.cn-north-1.compute.internal): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/hadoop/spark/python/pyspark/worker.py", line 75, in main
    command = pickleSer._read_with_length(infile)
  File "/home/hadoop/spark/python/pyspark/serializers.py", line 150, in _read_with_length
    return self.loads(obj)
ImportError: No module named parser
Run Code Online (Sandbox Code Playgroud)

和parser-src.zip在本地测试.

[hadoop@ip-172-31-10-231 ~]$ python
Python 2.7.8 (default, Nov  3 2014, 10:17:30) 
[GCC 4.8.2 20140120 …
Run Code Online (Sandbox Code Playgroud)

python hadoop emr apache-spark

17
推荐指数
3
解决办法
3万
查看次数

如何检查是否在Python中安装了模块,如果没有,请在代码中安装它?

我想为我的代码安装模块'mutagen'和'gTTS',但我想拥有它,所以它会在没有它们的每台计算机上安装模块,但如果没有它们,它将不会尝试安装它们.他们已经安装好了.我目前有:

def install(package):
    pip.main(['install', package])

install('mutagen')

install('gTTS')

from gtts import gTTS
from mutagen.mp3 import MP3
Run Code Online (Sandbox Code Playgroud)

但是,如果您已经拥有这些模块,那么只要您打开它,就会在程序启动时添加不必要的混乱.

python module python-module python-3.x

11
推荐指数
4
解决办法
3万
查看次数

标签 统计

python ×2

apache-spark ×1

emr ×1

hadoop ×1

module ×1

python-3.x ×1

python-module ×1