我使用的是spark-1.5.0-cdh5.6.0.尝试过示例应用程序(scala)命令是:
> spark-submit --class com.cloudera.spark.simbox.sparksimbox.WordCount --master local /home/hadoop/work/testspark.jar
Run Code Online (Sandbox Code Playgroud)
得到以下错误:
ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File file:/user/spark/applicationHistory does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:424)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
at com.cloudera.spark.simbox.sparksimbox.WordCount$.main(WordCount.scala:12)
at com.cloudera.spark.simbox.sparksimbox.WordCount.main(WordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Run Code Online (Sandbox Code Playgroud) 我正在尝试编译一个简单的 scala 程序,我正在使用 StreamingContext ,这是我的代码片段:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.scheduler.SparkListener
import org.apache.spark.scheduler.SparkListenerStageCompleted
import org.apache.spark.streaming.StreamingContext._ //error:object streaming is not a member of package org.apache.spark
object FileCount {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("File Count")
.setMaster("local")
val sc = new SparkContext(conf)
val textFile = sc.textFile(args(0))
val ssc = new StreamingContext(sc, Seconds(10)) //error : not found: type StreamingContext
sc.stop()
}
}
Run Code Online (Sandbox Code Playgroud)
我有这两个错误:
object streaming is not a member of package org.apache.spark
Run Code Online (Sandbox Code Playgroud)
和
not found: type StreamingContext
Run Code Online (Sandbox Code Playgroud)
任何帮助请!
假设这些是我的CSV文件:
11111;44444
22222;55555
11111;44444
33333;99999
11111;77777
22222;99999
Run Code Online (Sandbox Code Playgroud)
我希望第一列中出现的次数与第二列的值不同.像这样 :
(11111,2)
(22222,2)
(33333,1)
Run Code Online (Sandbox Code Playgroud)
我试过了:
object CountDestOcc {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("Word Count")
.setMaster("local")
val sc = new SparkContext(conf)
//loading text file into textFile object .(RDD)
val textFile = sc.textFile(args(0))
val appsdest = textFile.flatMap (line => line.split(" ")).map(p=>(p,1)).reduceByKey(_+_).collect()
appsdest.foreach(println)
sc.stop()
}
Run Code Online (Sandbox Code Playgroud)
我明白了:
(22222;55555,1)
(22222;99999,1)
(11111;77777,1)
(11111;44444,2)
(33333;99999,1)
Run Code Online (Sandbox Code Playgroud)
我怎样才能用第一把键巩固以获得预期的结果?