我正在将CSV文件加载到DataFrame中,如下所示.
val conf=new SparkConf().setAppName("dataframes").setMaster("local")
val sc=new SparkContext(conf)
val spark=SparkSession.builder().getOrCreate()
import spark.implicits._
val df = spark.
read.
format("org.apache.spark.csv").
option("header", true).
csv("/home/cloudera/Book1.csv")
scala> df.printSchema()
root
|-- name: string (nullable = true)
|-- address: string (nullable = true)
|-- age: string (nullable = true)
Run Code Online (Sandbox Code Playgroud)
如何将age列更改为类型Int?
如何ds通过传递列表参数在Spark 2.3 Java中选择数据集的多个列?
例如,这可以正常工作:
ds.select("col1","col2","col3").show();
Run Code Online (Sandbox Code Playgroud)
但是,这失败了:
List<String> columns = Arrays.toList("col1","col2","col3");
ds.select(columns.toString()).show()
Run Code Online (Sandbox Code Playgroud)