我是apache spark的新手,显然我在我的macbook中用自制软件安装了apache-spark:
Last login: Fri Jan 8 12:52:04 on console
user@MacBook-Pro-de-User-2:~$ pyspark
Python 2.7.10 (default, Jul 13 2015, 12:05:58)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/01/08 14:46:44 INFO SparkContext: Running Spark version 1.5.1
16/01/08 14:46:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/08 14:46:47 INFO SecurityManager: Changing view acls to: user
16/01/08 14:46:47 INFO …Run Code Online (Sandbox Code Playgroud) 我有R - 3.2.1的最后一个版本.现在我想在R上安装SparkR.执行后:
> install.packages("SparkR")
Run Code Online (Sandbox Code Playgroud)
我回来了:
Installing package into ‘/home/user/R/x86_64-pc-linux-gnu-library/3.2’
(as ‘lib’ is unspecified)
Warning in install.packages :
package ‘SparkR’ is not available (for R version 3.2.1)
Run Code Online (Sandbox Code Playgroud)
我也在我的机器上安装了Spark
Spark 1.4.0
Run Code Online (Sandbox Code Playgroud)
我怎么能解决这个问题?
我正在尝试运行此代码:
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.master("local") \
.appName("Word Count") \
.getOrCreate()
df = spark.createDataFrame([
(1, 144.5, 5.9, 33, 'M'),
(2, 167.2, 5.4, 45, 'M'),
(3, 124.1, 5.2, 23, 'F'),
(4, 144.5, 5.9, 33, 'M'),
(5, 133.2, 5.7, 54, 'F'),
(3, 124.1, 5.2, 23, 'F'),
(5, 129.2, 5.3, 42, 'M'),
], ['id', 'weight', 'height', 'age', 'gender'])
df.show()
print('Count of Rows: {0}'.format(df.count()))
print('Count of distinct Rows: {0}'.format((df.distinct().count())))
spark.stop()
Run Code Online (Sandbox Code Playgroud)
并得到一个错误
18/06/22 11:58:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in …Run Code Online (Sandbox Code Playgroud) 此异常正在上升lines.count()。
发生异常:py4j.protocol.Py4JError 调用 o26.isBarrier 时发生错误。跟踪:py4j.Py4JException:方法 isBarrier([]) 在 py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) 在 py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) 在 py4j.Gateway 不存在。 invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79) at from pyspark import SparkContext from pyspark import SparkConf
代码:
conf = SparkConf()
conf.setAppName("First App")
sc = SparkContext('local',conf=conf)
print("-----------------------------------------------------------------------------")
lines = sc.textFile("sample.csv")
print("-----------------------------------------------------------------------------")
lines.count()
Run Code Online (Sandbox Code Playgroud)