相关疑难解决方法(0)

如何将PyCharm与PySpark连接?

我是apache spark的新手,显然我在我的macbook中用自制软件安装了apache-spark:

Last login: Fri Jan  8 12:52:04 on console
user@MacBook-Pro-de-User-2:~$ pyspark
Python 2.7.10 (default, Jul 13 2015, 12:05:58)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/01/08 14:46:44 INFO SparkContext: Running Spark version 1.5.1
16/01/08 14:46:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/08 14:46:47 INFO SecurityManager: Changing view acls to: user
16/01/08 14:46:47 INFO …
Run Code Online (Sandbox Code Playgroud)

python homebrew pycharm apache-spark pyspark

71
推荐指数
4
解决办法
7万
查看次数

安装SparkR

我有R - 3.2.1的最后一个版本.现在我想在R上安装SparkR.执行后:

> install.packages("SparkR")
Run Code Online (Sandbox Code Playgroud)

我回来了:

Installing package into ‘/home/user/R/x86_64-pc-linux-gnu-library/3.2’
(as ‘lib’ is unspecified)
Warning in install.packages :
  package ‘SparkR’ is not available (for R version 3.2.1)
Run Code Online (Sandbox Code Playgroud)

我也在我的机器上安装了Spark

Spark 1.4.0
Run Code Online (Sandbox Code Playgroud)

我怎么能解决这个问题?

r apache-spark sparkr

46
推荐指数
2
解决办法
3万
查看次数

找不到密钥:_PYSPARK_DRIVER_CALLBACK_HOST

我正在尝试运行此代码:

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder \
        .master("local") \
        .appName("Word Count") \
        .getOrCreate()

df = spark.createDataFrame([
    (1, 144.5, 5.9, 33, 'M'),
    (2, 167.2, 5.4, 45, 'M'),
    (3, 124.1, 5.2, 23, 'F'),
    (4, 144.5, 5.9, 33, 'M'),
    (5, 133.2, 5.7, 54, 'F'),
    (3, 124.1, 5.2, 23, 'F'),
    (5, 129.2, 5.3, 42, 'M'),
   ], ['id', 'weight', 'height', 'age', 'gender'])

df.show()
print('Count of Rows: {0}'.format(df.count()))
print('Count of distinct Rows: {0}'.format((df.distinct().count())))

spark.stop()
Run Code Online (Sandbox Code Playgroud)

并得到一个错误

18/06/22 11:58:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in …
Run Code Online (Sandbox Code Playgroud)

python apache-spark pyspark

8
推荐指数
2
解决办法
9671
查看次数

在 pySpark 我得到 py4j.protocol.Py4JError: py4j.Py4JException: Method isBarrier([]) does not exist

此异常正在上升lines.count()

发生异常:py4j.protocol.Py4JError 调用 o26.isBarrier 时发生错误。跟踪:py4j.Py4JException:方法 isBarrier([]) 在 py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) 在 py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) 在 py4j.Gateway 不存在。 invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79) at from pyspark import SparkContext from pyspark import SparkConf

代码:

    conf = SparkConf()

    conf.setAppName("First App")

    sc = SparkContext('local',conf=conf)
    print("-----------------------------------------------------------------------------")
    lines = sc.textFile("sample.csv")
    print("-----------------------------------------------------------------------------")
    lines.count()
Run Code Online (Sandbox Code Playgroud)

py4j apache-spark pyspark

5
推荐指数
0
解决办法
4806
查看次数

标签 统计

apache-spark ×4

pyspark ×3

python ×2

homebrew ×1

py4j ×1

pycharm ×1

r ×1

sparkr ×1