Hat*_*her 3 scala apache-spark apache-spark-sql
我有一个包含double类型的csv文件.当我加载到数据帧时,我收到此消息告诉我类型字符串是java.lang.String不能转换为java.lang.Double虽然我的数据是数字.我怎么得到这个csv文件的数据帧包含double type.how我应该修改我的代码.
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.{ArrayType, DoubleType}
import org.apache.spark.sql.functions.split
import scala.collection.mutable._
object Example extends App {
val spark = SparkSession.builder.master("local").appName("my-spark-app").getOrCreate()
val data=spark.read.csv("C://lpsa.data").toDF("col1","col2","col3","col4","col5","col6","col7","col8","col9")
val data2=data.select("col2","col3","col4","col5","col6","col7")
Run Code Online (Sandbox Code Playgroud)
我可以做什么来将数据帧中的每一行转换为double类型?谢谢
使用select有cast:
import org.apache.spark.sql.functions.col
data.select(Seq("col2", "col3", "col4", "col5", "col6", "col7").map(
c => col(c).cast("double")
): _*)
Run Code Online (Sandbox Code Playgroud)
或者将架构传递给读者:
定义架构:
import org.apache.spark.sql.types._
val cols = Seq(
"col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8", "col9"
)
val doubleCols = Set("col2", "col3", "col4", "col5", "col6", "col7")
val schema = StructType(cols.map(
c => StructField(c, if (doubleCols contains c) DoubleType else StringType)
))
Run Code Online (Sandbox Code Playgroud)并将其用作schema方法的参数
spark.read.schema(schema).csv(path)
Run Code Online (Sandbox Code Playgroud)也可以使用模式推断:
spark.read.option("inferSchema", "true").csv(path)
Run Code Online (Sandbox Code Playgroud)
但它要贵得多.
| 归档时间: |
|
| 查看次数: |
5997 次 |
| 最近记录: |