Spark Scala:无法从字符串转换为int,因为它可能会截断

sha*_*ams 12 scala apache-spark

我玩火花时遇到了这个异常.

线程"main"中的异常org.apache.spark.sql.AnalysisException:无法price从string转换为int,因为它可能会截断目标对象的类型路径为: - field(class:"scala.Int",name:" price") - root class:"org.spark.code.executable.Main.Record"您可以向输入数据添加显式强制转换,也可以在目标对象中选择更高精度的字段类型;

如何解决这个异常?这是代码

object Main {

 case class Record(transactionDate: Timestamp, product: String, price: Int, paymentType: String, name: String, city: String, state: String, country: String,
                accountCreated: Timestamp, lastLogin: Timestamp, latitude: String, longitude: String)
 def main(args: Array[String]) {

   System.setProperty("hadoop.home.dir", "C:\\winutils\\");

   val schema = Encoders.product[Record].schema

   val df = SparkConfig.sparkSession.read
  .option("header", "true")
  .csv("SalesJan2009.csv");

   import SparkConfig.sparkSession.implicits._
   val ds = df.as[Record]

  //ds.groupByKey(body => body.state).count().show()

  import org.apache.spark.sql.expressions.scalalang.typed.{
  count => typedCount,
  sum => typedSum
}

  ds.groupByKey(body => body.state)
  .agg(typedSum[Record](_.price).name("sum(price)"))
  .withColumnRenamed("value", "group")
  .alias("Summary by state")
  .show()
}
Run Code Online (Sandbox Code Playgroud)

Sha*_*ala 31

您可以在读取csv文件时传递模式,这将解决问题

val spark = SparkSession.builder()
  .master("local")
  .appName("test")
  .getOrCreate()

import org.apache.spark.sql.Encoders
val schema = Encoders.product[Record].schema

val ds = spark.read
  .option("header", "true")
  .schema(schema)  // passing schema 
  .option("timestampFormat", "MM/dd/yyyy HH:mm") // passing timestamp format
  .csv(path)// csv path
  .as[Record] // convert to DS
Run Code Online (Sandbox Code Playgroud)

有了这个,您还需要传递时间戳格式,因为您有自定义格式

希望这可以帮助

  • 好的!不知道如何使用“Encoders.product[MyCaseClass].schema”进行提取 (2认同)