own*_*so4 5 csv scala apache-spark
我有以下案例类:
case class OrderDetails(OrderID : String, ProductID : String, UnitPrice : Double,
Qty : Int, Discount : Double)
Run Code Online (Sandbox Code Playgroud)
我正在尝试阅读此csv:https://github.com/xsankar/fdps-v3/blob/master/data/NW-Order-Details.csv
这是我的代码:
val spark = SparkSession.builder.master(sparkMaster).appName(sparkAppName).getOrCreate()
import spark.implicits._
val orderDetails = spark.read.option("header","true").csv( inputFiles + "NW-Order-Details.csv").as[OrderDetails]
Run Code Online (Sandbox Code Playgroud)
错误是:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Cannot up cast `UnitPrice` from string to double as it may truncate
The type path of the target object is:
- field (class: "scala.Double", name: "UnitPrice")
- root class: "es.own3dh2so4.OrderDetails"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;
Run Code Online (Sandbox Code Playgroud)
如果所有字段都是"双倍"值,为什么不能转换呢?我不明白什么?
Spark版本2.1.0,Scala版本2.11.7
Vid*_*dya 11
您只需要将您的字段显式地转换为Double:
val orderDetails = spark.read
.option("header","true")
.csv( inputFiles + "NW-Order-Details.csv")
.withColumn("unitPrice", 'UnitPrice.cast(DoubleType))
.as[OrderDetails]
Run Code Online (Sandbox Code Playgroud)
另外,通过Scala(和Java)约定,您的case类构造函数参数应该是更低的camel情况:
case class OrderDetails(orderID: String,
productID: String,
unitPrice: Double,
qty: Int,
discount: Double)
Run Code Online (Sandbox Code Playgroud)
小智 7
如果我们想改变多列的数据类型;如果我们使用 withColumn 选项,它会看起来很难看。为数据应用架构的更好方法是
val caseClassschema = Encoders.product[CaseClass].schema
Run Code Online (Sandbox Code Playgroud)val data = spark.read.schema(caseClassschema)
Run Code Online (Sandbox Code Playgroud)| 归档时间: |
|
| 查看次数: |
4947 次 |
| 最近记录: |