Ada*_*cha 4 macos apache-spark apache-zeppelin
当我运行代码
val home = "/Users/adremja/Documents/Kaggle/outbrain"
val documents_categories = sc.textFile(home + "/documents_categories.csv")
documents_categories take(10) foreach println
Run Code Online (Sandbox Code Playgroud)
在火花壳中它完美地运作
scala> val home = "/Users/adremja/Documents/Kaggle/outbrain"
home: String = /Users/adremja/Documents/Kaggle/outbrain
scala> val documents_categories = sc.textFile(home + "/documents_categories.csv")
documents_categories: org.apache.spark.rdd.RDD[String] = /Users/adremja/Documents/Kaggle/outbrain/documents_categories.csv MapPartitionsRDD[21] at textFile at <console>:26
scala> documents_categories take(10) foreach println
document_id,category_id,confidence_level
1595802,1611,0.92
1595802,1610,0.07
1524246,1807,0.92
1524246,1608,0.07
1617787,1807,0.92
1617787,1608,0.07
1615583,1305,0.92
1615583,1806,0.07
1615460,1613,0.540646372
Run Code Online (Sandbox Code Playgroud)
但是,当我尝试在Zeppelin中运行时,我收到一个错误
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
at org.apache.spark.SparkContext.withScope(SparkContext.scala:679)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:797)
... 46 elided
Run Code Online (Sandbox Code Playgroud)
你知道问题出在哪里吗?
我从自制软件(我在zeppelin-env.sh中将其链接为SPARK_HOME)和来自Zeppelin网站的Zeppelin 0.6.2二进制文件中获得了火花2.0.1.
好吧,看起来我找到了解决方案.从zeppelin的lib文件夹我删除了:
并将其替换为版本2.6.5,它使用了spark.
它现在可以工作,但我不知道我是否不会破坏其他任何东西.
| 归档时间: |
|
| 查看次数: |
3605 次 |
| 最近记录: |