相关疑难解决方法(0)

ImportError:在spark worker上没有名为numpy的模块

在客户端模式下启动pyspark.bin/pyspark --master yarn-client --num-executors 60shell上的import numpy很好但是在kmeans中失败了.不知何故,执行者没有安装numpy是我的感觉.我没有找到任何好的解决方案让工人知道numpy.我尝试设置PYSPARK_PYTHON,但这也没有用.

import numpy
features = numpy.load(open("combined_features.npz"))
features = features['arr_0']
features.shape
features_rdd = sc.parallelize(features, 5000)
from pyspark.mllib.clustering import KMeans, KMeansModel

from numpy import array
from math import sqrt
clusters = KMeans.train(features_rdd, 2, maxIterations=10, runs=10, initializationMode="random")
Run Code Online (Sandbox Code Playgroud)

堆栈跟踪

 org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/hadoop/3/scratch/local/usercache/ajkale/appcache/application_1451301880705_525011/container_1451301880705_525011_01_000011/pyspark.zip/pyspark/worker.py", line 98, in main
    command = pickleSer._read_with_length(infile)
  File "/hadoop/3/scratch/local/usercache/ajkale/appcache/application_1451301880705_525011/container_1451301880705_525011_01_000011/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length
    return self.loads(obj)
  File "/hadoop/3/scratch/local/usercache/ajkale/appcache/application_1451301880705_525011/container_1451301880705_525011_01_000011/pyspark.zip/pyspark/serializers.py", line 422, in loads
    return pickle.loads(obj)
  File "/hadoop/3/scratch/local/usercache/ajkale/appcache/application_1451301880705_525011/container_1451301880705_525011_01_000011/pyspark.zip/pyspark/mllib/__init__.py", line 25, in <module>

ImportError: …
Run Code Online (Sandbox Code Playgroud)

python numpy apache-spark pyspark

14
推荐指数
2
解决办法
3万
查看次数

标签 统计

apache-spark ×1

numpy ×1

pyspark ×1

python ×1