我正在执行以下代码以从文本文件创建数据框.
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.{SQLContext, Row}
import org.apache.spark.sql.types.{StructType, StringType, StructField}
/**
* Created by PSwain on 6/19/2016.
*/
object RddToDataframe extends App {
val scnf=new SparkConf().setAppName("RddToDataFrame").setMaster("local[1]")
val sc = new SparkContext(scnf)
val sqlContext = new SQLContext(sc)
val employeeRdd=sc.textFile("C:\\Users\\pswain\\IdeaProjects\\test1\\src\\main\\resources\\employee")
//Creating schema
val employeeSchemaString="id name age"
val schema = StructType(employeeSchemaString.split(",").map( colNmae => StructField(colNmae,StringType,true)))
//Creating RowRdd
val rowRdd= employeeRdd.map(row => row.split(",")).map(row => Row(row(0).trim.toInt,row(1),row(2).trim.toInt))
//Creating dataframe = RDD[rowRdd] + schema
val employeeDF=sqlContext.createDataFrame(rowRdd,schema). registerTempTable("Employee")
sqlContext.sql("select * from Employee").show()
}
Run Code Online (Sandbox Code Playgroud)
但是在InteliJ中执行时,我发现类型不匹配错误如下所示.无法识别出现此错误的原因我只是将字符串转换为整数.员工文件具有以下输入,它们显示在一行中,但它们各自为一行.
1201,satish,25 1202,krishna,28 1203,amith,39 …
我正在阅读用于文件处理的 Scala 烹饪书并遇到以下代码。试图在我的 IDE 中运行它,但出现错误。我错过了什么吗,我以前从未遇到过这样的数组语法。
import java.io.IOException
import scala.io.{BufferedSource, Source}
object ReadingCSVfile extends App {
var bufferedSource = None: Option[BufferedSource]
try {
bufferedSource =
Some(
Source.fromFile(
"C:\\Users\\pswain\\IdeaProjects\\test1\\src\\main\\resources\\finance.csv")
)
for(i <- bufferedSource.get.getLines()) {
val Array(month, Income, Expenses, Profit) = i.split(",").map(x => x.trim)
println(s"$month $revenue $expenses $profit")
}
} catch {
case e : IOException => print(e.printStackTrace())
} finally {bufferedSource.get.close()}
}
Run Code Online (Sandbox Code Playgroud) 我println对下面的代码中的第3个有点困惑,其中输出是None.根据我的理解:
lookupPlayer(3)将给出None哪个是子类型Option[Nothing]map在None将被调用.但是工作的map功能如何None?请帮我理解一个简单的例子.
case class Player(name: String)
def lookupPlayer(id: Int): Option[Player] = {
if (id == 1) Some(new Player("Sean"))
else if(id == 2) Some(new Player("Greg"))
else None
}
def lookupScore(player: Player): Option[Int] = {
if (player.name == "Sean") Some(1000000) else None
}
println(lookupPlayer(1).map(lookupScore)) // Some(Some(1000000))
println(lookupPlayer(2).map(lookupScore)) // Some(None)
println(lookupPlayer(3).map(lookupScore)) // None
Run Code Online (Sandbox Code Playgroud) 我有下面的代码,通常map函数是一个高阶函数,它在其参数中获取一个函数并使用该函数计算元素.但在这种情况下,map不是一个函数而是一个Map类型.无法理解地图功能如何工作?
Spark context available as sc (master = yarn-client, app id = application_1473775536920_2711).
SQL context available as sqlContext.
scala> val pws = Map("Apache Spark" -> "http://spark.apache.org/", "Scala" -> "http://www.scala-lang.org/")
pws: scala.collection.immutable.Map[String,String] = Map(Apache Spark -> http://spark.apache.org/, Scala -> http://www.scala-lang.org/)
scala> val websites = sc.parallelize(Seq("Apache Spark", "Scala")).map(pws).collect
16/09/23 02:50:15 WARN util.ClosureCleaner: Expected a closure; got scala.collection.immutable.Map$Map2
[Stage 0:> (0 + 0) / 2]16/09/23 02:50:31 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers …Run Code Online (Sandbox Code Playgroud)