RDD地图功能的工作方式不同

Pri*_*ain 0 scala apache-spark rdd

我有下面的代码,通常map函数是一个高阶函数,它在其参数中获取一个函数并使用该函数计算元素.但在这种情况下,map不是一个函数而是一个Map类型.无法理解地图功能如何工作?

Spark context available as sc (master = yarn-client, app id = application_1473775536920_2711).
SQL context available as sqlContext.

scala> val pws = Map("Apache Spark" -> "http://spark.apache.org/", "Scala" -> "http://www.scala-lang.org/")
pws: scala.collection.immutable.Map[String,String] = Map(Apache Spark -> http://spark.apache.org/, Scala -> http://www.scala-lang.org/)

scala> val websites = sc.parallelize(Seq("Apache Spark", "Scala")).map(pws).collect
16/09/23 02:50:15 WARN util.ClosureCleaner: Expected a closure; got scala.collection.immutable.Map$Map2
[Stage 0:>                                                          (0 + 0) / 2]16/09/23 02:50:31 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
websites: Array[String] = Array(http://spark.apache.org/, http://www.scala-lang.org/)
Run Code Online (Sandbox Code Playgroud)

Sim*_*mon 6

特质Map[A, +B]扩展了特质Function1[-T1, +R].换句话说,a Map 一个函数.在你的情况下,你有一个Map[String, String]意味着你的地图将具有def apply(arg: String): String适用于你的所有元素的地图RDD.

因此即使在普通的Scala中你也可以做类似的事情

val m = Map(("a" -> "b"), ("c" -> "d"))
val s = Seq("a", "c")

s.map(m)
res0: Seq[String] = List(b, d)
Run Code Online (Sandbox Code Playgroud)

为此编译类型ms需要匹配.