小编G.S*_*leh的帖子

错误SparkContext:初始化SparkContext时出错

我使用的是spark-1.5.0-cdh5.6.0.尝试过示例应用程序(scala)命令是:

> spark-submit --class com.cloudera.spark.simbox.sparksimbox.WordCount --master local /home/hadoop/work/testspark.jar
Run Code Online (Sandbox Code Playgroud)

得到以下错误:

 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File file:/user/spark/applicationHistory does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:424)
        at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
        at com.cloudera.spark.simbox.sparksimbox.WordCount$.main(WordCount.scala:12)
        at com.cloudera.spark.simbox.sparksimbox.WordCount.main(WordCount.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Run Code Online (Sandbox Code Playgroud)

scala apache-spark

7
推荐指数
1
解决办法
2万
查看次数

对象流不是包 org.apache.spark 的成员

我正在尝试编译一个简单的 scala 程序,我正在使用 StreamingContext ,这是我的代码片段:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.scheduler.SparkListener
import org.apache.spark.scheduler.SparkListenerStageCompleted
import org.apache.spark.streaming.StreamingContext._ //error:object streaming is not a member of package org.apache.spark
object FileCount {
    def main(args: Array[String]) {
    val conf = new SparkConf()
    .setAppName("File Count")
    .setMaster("local")

    val sc = new SparkContext(conf)
    val textFile = sc.textFile(args(0))
    val ssc = new StreamingContext(sc, Seconds(10)) //error : not found: type StreamingContext
    sc.stop()    
  }
}
Run Code Online (Sandbox Code Playgroud)

我有这两个错误:

object streaming is not a member of package org.apache.spark
Run Code Online (Sandbox Code Playgroud)

not found: type StreamingContext
Run Code Online (Sandbox Code Playgroud)

任何帮助请!

scala apache-spark

5
推荐指数
1
解决办法
7558
查看次数

计算scala spark-RDD中csv文件中出现的次数

假设这些是我的CSV文件:

11111;44444
22222;55555
11111;44444
33333;99999
11111;77777
22222;99999
Run Code Online (Sandbox Code Playgroud)

我希望第一列中出现的次数与第二列的值不同.像这样 :

(11111,2)
(22222,2)
(33333,1)
Run Code Online (Sandbox Code Playgroud)

我试过了:

object CountDestOcc {
  def main(args: Array[String]) {
    val conf = new SparkConf()
    .setAppName("Word Count")
    .setMaster("local")

    val sc = new SparkContext(conf)

    //loading text file into textFile object .(RDD)
    val textFile = sc.textFile(args(0))

   val appsdest = textFile.flatMap (line => line.split(" ")).map(p=>(p,1)).reduceByKey(_+_).collect()
   appsdest.foreach(println)
   sc.stop()

  }
Run Code Online (Sandbox Code Playgroud)

我明白了:

(22222;55555,1)
(22222;99999,1)
(11111;77777,1)
(11111;44444,2)
(33333;99999,1)
Run Code Online (Sandbox Code Playgroud)

我怎样才能用第一把键巩固以获得预期的结果?

scala apache-spark

2
推荐指数
1
解决办法
1625
查看次数

标签 统计

apache-spark ×3

scala ×3