sha*_*ams 12 scala apache-spark
我玩火花时遇到了这个异常.
线程"main"中的异常org.apache.spark.sql.AnalysisException:无法
price从string转换为int,因为它可能会截断目标对象的类型路径为: - field(class:"scala.Int",name:" price") - root class:"org.spark.code.executable.Main.Record"您可以向输入数据添加显式强制转换,也可以在目标对象中选择更高精度的字段类型;
如何解决这个异常?这是代码
object Main {
case class Record(transactionDate: Timestamp, product: String, price: Int, paymentType: String, name: String, city: String, state: String, country: String,
accountCreated: Timestamp, lastLogin: Timestamp, latitude: String, longitude: String)
def main(args: Array[String]) {
System.setProperty("hadoop.home.dir", "C:\\winutils\\");
val schema = Encoders.product[Record].schema
val df = SparkConfig.sparkSession.read
.option("header", "true")
.csv("SalesJan2009.csv");
import SparkConfig.sparkSession.implicits._
val ds = df.as[Record]
//ds.groupByKey(body => body.state).count().show()
import org.apache.spark.sql.expressions.scalalang.typed.{
count => typedCount,
sum => typedSum
}
ds.groupByKey(body => body.state)
.agg(typedSum[Record](_.price).name("sum(price)"))
.withColumnRenamed("value", "group")
.alias("Summary by state")
.show()
}
Run Code Online (Sandbox Code Playgroud)
Sha*_*ala 31
您可以在读取csv文件时传递模式,这将解决问题
val spark = SparkSession.builder()
.master("local")
.appName("test")
.getOrCreate()
import org.apache.spark.sql.Encoders
val schema = Encoders.product[Record].schema
val ds = spark.read
.option("header", "true")
.schema(schema) // passing schema
.option("timestampFormat", "MM/dd/yyyy HH:mm") // passing timestamp format
.csv(path)// csv path
.as[Record] // convert to DS
Run Code Online (Sandbox Code Playgroud)
有了这个,您还需要传递时间戳格式,因为您有自定义格式
希望这可以帮助