小编ajk*_*jkl的帖子

ImportError:在spark worker上没有名为numpy的模块

在客户端模式下启动pyspark.bin/pyspark --master yarn-client --num-executors 60shell上的import numpy很好但是在kmeans中失败了.不知何故,执行者没有安装numpy是我的感觉.我没有找到任何好的解决方案让工人知道numpy.我尝试设置PYSPARK_PYTHON,但这也没有用.

import numpy
features = numpy.load(open("combined_features.npz"))
features = features['arr_0']
features.shape
features_rdd = sc.parallelize(features, 5000)
from pyspark.mllib.clustering import KMeans, KMeansModel

from numpy import array
from math import sqrt
clusters = KMeans.train(features_rdd, 2, maxIterations=10, runs=10, initializationMode="random")
Run Code Online (Sandbox Code Playgroud)

堆栈跟踪

 org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/hadoop/3/scratch/local/usercache/ajkale/appcache/application_1451301880705_525011/container_1451301880705_525011_01_000011/pyspark.zip/pyspark/worker.py", line 98, in main
    command = pickleSer._read_with_length(infile)
  File "/hadoop/3/scratch/local/usercache/ajkale/appcache/application_1451301880705_525011/container_1451301880705_525011_01_000011/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length
    return self.loads(obj)
  File "/hadoop/3/scratch/local/usercache/ajkale/appcache/application_1451301880705_525011/container_1451301880705_525011_01_000011/pyspark.zip/pyspark/serializers.py", line 422, in loads
    return pickle.loads(obj)
  File "/hadoop/3/scratch/local/usercache/ajkale/appcache/application_1451301880705_525011/container_1451301880705_525011_01_000011/pyspark.zip/pyspark/mllib/__init__.py", line 25, in <module>

ImportError: …
Run Code Online (Sandbox Code Playgroud)

python numpy apache-spark pyspark

14
推荐指数
2
解决办法
3万
查看次数

通过Spark访问HBase表

我正在使用这个代码示例http://www.vidyasource.com/blog/Programming/Scala/Java/Data/Hadoop/Analytics/2014/01/25/lighting-a-spark-with-hbase来读取hbase使用Spark的表只有通过代码添加hbase.zookeeper.quorum的唯一更改,因为它没有从hbase-site.xml中选择它.

Spark 1.5.3 HBase 0.98.0

我正面临着这个错误 -

java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString
at org.apache.hadoop.hbase.protobuf.RequestConverter.buildRegionSpecifier(RequestConverter.java:921)
at org.apache.hadoop.hbase.protobuf.RequestConverter.buildGetRowOrBeforeRequest(RequestConverter.java:132)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1520)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1294)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1128)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1111)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1070)
at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:347)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:201)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159)
at test.MyHBase.getTable(MyHBase.scala:33)
at test.MyHBase.<init>(MyHBase.scala:11)
at $line43.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.fetch(<console>:30)
at $line44.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(<console>:49)
at $line44.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(<console>:49)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
at scala.collection.AbstractIterator.to(Iterator.scala:1194)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:905)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:905)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1848)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1848)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at …
Run Code Online (Sandbox Code Playgroud)

hadoop hbase scala apache-spark

11
推荐指数
1
解决办法
945
查看次数

sbt程序集着色创建胖jar以在spark上运行

我正在使用sbt程序集来创建一个可以在火花上运行的胖罐.有依赖性grpc-netty.spark上的Guava版本比所需的版本旧grpc-netty,我遇到了这个错误:java.lang.NoSuchMethodError:com.google.common.base.Preconditions.checkArgument.我能够通过在spark上将userClassPathFirst设置为true来解决此问题,但会导致其他错误.

如果我错了,请纠正我,但根据我的理解,如果我正确地进行着色,我不应该将userClassPathFirst设置为true.这是我现在的着色方式:

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.guava.**" -> "my_conf.@1")
    .inLibrary("com.google.guava" % "guava" % "20.0")
    .inLibrary("io.grpc" % "grpc-netty" % "1.1.2")
)

libraryDependencies ++= Seq(
  "org.scalaj" %% "scalaj-http" % "2.3.0",
  "org.json4s" %% "json4s-native" % "3.2.11",
  "org.json4s" %% "json4s-jackson" % "3.2.11",
  "org.apache.spark" %% "spark-core" % "2.2.0" % "provided",
  "org.apache.spark" % "spark-sql_2.11" % "2.2.0" % "provided",
  "org.clapper" %% "argot" % "1.0.3",
  "com.typesafe" % "config" % "1.3.1",
  "com.databricks" %% "spark-csv" % "1.5.0",
  "org.apache.spark" % "spark-mllib_2.11" % "2.2.0" % …
Run Code Online (Sandbox Code Playgroud)

sbt guava sbt-assembly apache-spark grpc

11
推荐指数
1
解决办法
1048
查看次数

为什么这个列表理解返回一个数组{Any,1}而不是一个数组{Symbol,1}?

当我尝试Array使用列表推导创建一个时,Array{Any, 1}即使我将所有元素编码为"symbol" ,它也会产生:

julia> u_col_names=[symbol("user_id"), symbol("age"), symbol("sex"), symbol("occupation"), symbol("zip_code")]
5-element Array{Symbol,1}:
 :user_id   
 :age       
 :sex       
 :occupation
 :zip_code 

julia> col_names=["user_id", "age", "sex", "occupation", "zip_code"]
5-element Array{ASCIIString,1}:
 "user_id"   
 "age"       
 "sex"       
 "occupation"
 "zip_code"  

julia> u_col_names=[symbol(col_names[i]) for i in 1:size(col_names)[1]]
5-element Array{Any,1}:
 :user_id   
 :age       
 :sex       
 :occupation
 :zip_code 
Run Code Online (Sandbox Code Playgroud)

为什么最后一个列表理解返回Array{Any, 1}而不是Array{Symbol, 1}?请注意,以下内容确实返回Array{Symbol, 1}:

julia> u_col_names=[symbol("col_names$i") for i in 1:size(col_names)[1]]
5-element Array{Symbol,1}:
 :col_names1
 :col_names2
 :col_names3
 :col_names4
 :col_names5
Run Code Online (Sandbox Code Playgroud)

有趣的是,以下内容也是如此:

julia> col_names[1]
"user_id"

julia> symbol(col_names[1])
:user_id

julia> [symbol(col_names[1]), …
Run Code Online (Sandbox Code Playgroud)

arrays type-inference julia

7
推荐指数
1
解决办法
243
查看次数

日期格式化数据框架字符串列

有没有办法做这样的事情(这是在R)

df$dataCol <- as.Date(df$dataCol, format="%Y%m%d")
Run Code Online (Sandbox Code Playgroud)

其中dataCol的格式为"20151009".

  1. 有没有办法在julia中将列类型更改为日期?
  2. 我没有找到使用Date.jl包执行此操作的方法.

julia

5
推荐指数
1
解决办法
688
查看次数

使用g ++编译主模块的奇怪错误

我试图使用"g ++ main.cpp -c"编译下面的代码,但它给了我这个奇怪的错误..任何想法?

main.cpp: In function ‘int main()’:
main.cpp:9:17: error: invalid conversion from ‘Graph*’ to ‘int’
main.cpp:9:17: error:   initializing argument 1 of ‘Graph::Graph(int)’
main.cpp:10:16: warning: deprecated conversion from string constant to ‘char*’
Run Code Online (Sandbox Code Playgroud)

这是我正在尝试编译的主要模块,下面是我在graph.hpp中的图形类

#include <iostream>
#include "graph.hpp"

using namespace std;

int main()
{
  Graph g;
  g = new Graph();
  char* path = "graph.csv";
  g.createGraph(path);
  return 0;
}
Run Code Online (Sandbox Code Playgroud)

这是我的Graph类

    /*
 * graph.hpp
 *
 *  Created on: Jan 28, 2012
 *      Author: ajinkya
 */

#ifndef _GRAPH_HPP_
#define _GRAPH_HPP_

#include "street.hpp"
#include …
Run Code Online (Sandbox Code Playgroud)

c++ g++

2
推荐指数
1
解决办法
360
查看次数

标签 统计

apache-spark ×3

julia ×2

arrays ×1

c++ ×1

g++ ×1

grpc ×1

guava ×1

hadoop ×1

hbase ×1

numpy ×1

pyspark ×1

python ×1

sbt ×1

sbt-assembly ×1

scala ×1

type-inference ×1