小编Pri*_*ain的帖子

在创建数据框时面对"scala.MatchError:1201(类java.lang.Integer)"

我正在执行以下代码以从文本文件创建数据框.

    import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.{SQLContext, Row}
import org.apache.spark.sql.types.{StructType, StringType, StructField}


/**
  * Created by PSwain on 6/19/2016.
  */
object RddToDataframe extends App {

  val scnf=new SparkConf().setAppName("RddToDataFrame").setMaster("local[1]")
  val sc = new SparkContext(scnf)
  val sqlContext = new SQLContext(sc)

  val employeeRdd=sc.textFile("C:\\Users\\pswain\\IdeaProjects\\test1\\src\\main\\resources\\employee")

  //Creating schema

  val employeeSchemaString="id name age"
  val schema = StructType(employeeSchemaString.split(",").map( colNmae => StructField(colNmae,StringType,true)))

  //Creating  RowRdd
  val rowRdd= employeeRdd.map(row => row.split(",")).map(row => Row(row(0).trim.toInt,row(1),row(2).trim.toInt))

  //Creating dataframe = RDD[rowRdd] + schema
  val employeeDF=sqlContext.createDataFrame(rowRdd,schema). registerTempTable("Employee")

  sqlContext.sql("select * from Employee").show()


}
Run Code Online (Sandbox Code Playgroud)

但是在InteliJ中执行时,我发现类型不匹配错误如下所示.无法识别出现此错误的原因我只是将字符串转换为整数.员工文件具有以下输入,它们显示在一行中,但它们各自为一行.

1201,satish,25 1202,krishna,28 1203,amith,39 …

scala apache-spark spark-dataframe

1
推荐指数
1
解决办法
4816
查看次数

数组解构语法

我正在阅读用于文件处理的 Scala 烹饪书并遇到以下代码。试图在我的 IDE 中运行它,但出现错误。我错过了什么吗,我以前从未遇到过这样的数组语法。

import java.io.IOException
import scala.io.{BufferedSource, Source}

object ReadingCSVfile extends App {
  var bufferedSource = None: Option[BufferedSource]
  try {
    bufferedSource =
       Some(
         Source.fromFile(
           "C:\\Users\\pswain\\IdeaProjects\\test1\\src\\main\\resources\\finance.csv")
       )

    for(i <- bufferedSource.get.getLines()) {
      val Array(month, Income, Expenses, Profit) = i.split(",").map(x => x.trim)
      println(s"$month $revenue $expenses $profit")
    }
  } catch {
      case e : IOException => print(e.printStackTrace())
    } finally {bufferedSource.get.close()}
  }
Run Code Online (Sandbox Code Playgroud)

scala pattern-matching

1
推荐指数
1
解决办法
984
查看次数

scala中Option中的map函数

println对下面的代码中的第3个有点困惑,其中输出是None.根据我的理解:

  1. lookupPlayer(3)将给出None哪个是子类型Option[Nothing]
  2. 然后,mapNone将被调用.但是工作的map功能如何None

请帮我理解一个简单的例子.

case class Player(name: String)

def lookupPlayer(id: Int): Option[Player] = {
  if (id == 1) Some(new Player("Sean"))
  else if(id == 2) Some(new Player("Greg"))
  else None
}

def lookupScore(player: Player): Option[Int] = {
  if (player.name == "Sean") Some(1000000) else None
}

println(lookupPlayer(1).map(lookupScore))  // Some(Some(1000000))
println(lookupPlayer(2).map(lookupScore))  // Some(None)
println(lookupPlayer(3).map(lookupScore))  // None
Run Code Online (Sandbox Code Playgroud)

functional-programming scala

1
推荐指数
1
解决办法
1238
查看次数

RDD地图功能的工作方式不同

我有下面的代码,通常map函数是一个高阶函数,它在其参数中获取一个函数并使用该函数计算元素.但在这种情况下,map不是一个函数而是一个Map类型.无法理解地图功能如何工作?

Spark context available as sc (master = yarn-client, app id = application_1473775536920_2711).
SQL context available as sqlContext.

scala> val pws = Map("Apache Spark" -> "http://spark.apache.org/", "Scala" -> "http://www.scala-lang.org/")
pws: scala.collection.immutable.Map[String,String] = Map(Apache Spark -> http://spark.apache.org/, Scala -> http://www.scala-lang.org/)

scala> val websites = sc.parallelize(Seq("Apache Spark", "Scala")).map(pws).collect
16/09/23 02:50:15 WARN util.ClosureCleaner: Expected a closure; got scala.collection.immutable.Map$Map2
[Stage 0:>                                                          (0 + 0) / 2]16/09/23 02:50:31 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers …
Run Code Online (Sandbox Code Playgroud)

scala apache-spark rdd

0
推荐指数
1
解决办法
239
查看次数