Siv*_*iva 3 scala apache-spark
我试图将示例中filter
显示的内容添加到我的程序中:
val logFile = "/tmp/master.txt"
val sc = new JavaSparkContext("local[4]", "Twitter Analyzer", "/home/welcome/Downloads/spark-1.1.0/",Array("target/scala-2.10/Simple-assembly-0.1.0.jar"))
val twitterFeed = sc.textFile(logFile).cache()
while (iterator.hasNext) {
val value = iterator.next()
val numAs = twitterFeed.filter(line => line.contains(value))
numAs.saveAsTextFile("/tmp/output/positive/" + value)
}
Run Code Online (Sandbox Code Playgroud)
我得到编译错误如下:
[info] Compiling 1 Scala source to /home/siva/file1/target/scala-2.10/classes...
[error] /home/siva/file1/src/main/scala/com/chimpler/example/twitter/Tweet.scala:27: missing parameter type
[error] val numAs = twitterFeed.filter(line => line.contains(value))
[error] ^
[error] one error found
[error] (compile:compile) Compilation failed
[error] Total time: 5 s, completed 19 Sep, 2014 1:31:26 PM
Run Code Online (Sandbox Code Playgroud)
有任何想法吗?
正如@groverboy在评论中建议的那样,你应该使用org.apache.spark.SparkContext.Spark编程指南的初始化Spark也很清楚.
import org.apache.spark._
val conf = new SparkConf()
.setMaster("local[4]")
.setAppName("Twitter Analyzer")
.setSparkHome("/home/welcome/Downloads/spark-1.1.0/")
.setJars(Seq("target/scala-2.10/Simple-assembly-0.1.0.jar"))
val sc = new SparkContext(conf)
Run Code Online (Sandbox Code Playgroud)
原因是Scala中的类型推断需要类型上下文来推断line
参数的类型.
val numAs = twitterFeed.filter(line => line.contains(value))
Run Code Online (Sandbox Code Playgroud)
它显然是String
类型,但使用Java版本的SparkContext - JavaSparkContext - 您只是丢失了类型信息.
如果您使用SparkContext
上述行可以进一步简化为:
val numAs = twitterFeed.filter(_.contains(value))
Run Code Online (Sandbox Code Playgroud)
甚至:
twitterFeed.filter(_ contains value)
Run Code Online (Sandbox Code Playgroud)
所有的好吃的东西SparkContext
.
归档时间: |
|
查看次数: |
4138 次 |
最近记录: |