在我的项目中,我的外部库是spark-assembly-1.3.1-hadoop2.6.0,如果我按'.',IDE会通知我toDF(),但它告诉我toDF()在编码时无法解析符号.对不起我toDF()在Apache Spark中找不到文档.
case class Feature(name:String, value:Double, time:String, period:String)
val RESRDD = RDD.map(tuple => {
var bson=new BasicBSONObject();
bson.put("name",name);
bson.put("value",value);
(null,bson);
})
RESRDD
.map(_._2)
.map(f => Feature(f.get("name").toString, f.get("value").toString.toDouble))
.toDF()
Run Code Online (Sandbox Code Playgroud) 我试图List.toString根据其类型参数修改行为.由于List无法扩展,它被自定义类包装CList(可能带有隐含,但问题会保持不变?).打印时出现问题CList的CList秒.以下是评论中的示例和相应输出:
object Foo {
import scala.reflect.runtime.universe._
class CList[A: TypeTag](val l: List[A]) {
override def toString = typeOf[A] match {
case t if t =:= typeOf[Char] => l.mkString
case _ => "[" + l.mkString(", ") + "]"
}
}
}
import Foo.CList
val c = new CList(List(1, 2)) // prints "[1, 2]"
println(c)
val c2 = new CList(List('a', 'b')) // prints "ab"
println(c2)
val c3 = new CList(List(
List(1, 2),
List(3, …Run Code Online (Sandbox Code Playgroud) 我有一些spark scala代码在spark-shell中没有问题.
这个问题的核心在于这几条线.我想在数据框中添加一行:
object SparkPipeline {
def main(args: Array[String]) {
val spark = (SparkSession
.builder()
.appName("SparkPipeline")
.getOrCreate()
)
df = (spark
.read
.format("com.databricks.spark.avro")
.load(DATA_PATH)
)
case class DataRow(field1: String, field2: String)
val row_df = Seq(DataRow("FOO", "BAR")).toDF() // THIS FAILS
val df_augmented = df.union(row_df)
//
// Additional code here
//
}
}
Run Code Online (Sandbox Code Playgroud)
但是,当我使用sbt将其打包为jar时,sbt失败并出现以下错误:
value toDF is not a member of Seq[DataRow]
Run Code Online (Sandbox Code Playgroud)
我试着按照这个问题来做:
val spark = (SparkSession
.builder()
.appName("TrainSimpleRF")
.getOrCreate()
)
val sc = spark.sparkContext
val sqlContext= new org.apache.spark.sql.SQLContext(sc)
import …Run Code Online (Sandbox Code Playgroud)