我只是尝试用 Spark 制作数据框。我只是尝试编写如下代码。
首先,我导入如下
import org.apache.spark.sql.types._
import org.apache.spark.storage.StorageLevel
import scala.io.Source
import scala.collection.mutable.HashMap
import java.io.File
import org.apache.spark.sql.Row
import scala.collection.mutable.ListBuffer
import org.apache.spark.util._
import org.apache.spark.sql.types.IntegerType`
Run Code Online (Sandbox Code Playgroud)
然后,我尝试为数据框创建行和架构,如下所示。
val Employee = Seq(Row("Kim","Seoul","1000000"),Row("Lee","Busan","2000000"),Row("Park","Jeju","3000000"),Row("Jeong","Daejon","3400000"))
val EmployeeSchema = List(StructField("Name", StringType, true), StructField("City", StringType, true), StructField("Salary", IntegerType, true))
val EmpDF = spark.createDataFrame(spark.sparkContext.parallelize(Employee),StructType(EmployeeSchema))
Run Code Online (Sandbox Code Playgroud)
最后,我尝试查看数据框是否可以使用
EmpDF.show
Run Code Online (Sandbox Code Playgroud)
我收到如下错误
ERROR Executor: Exception in task 2.0 in stage 1.0 (TID 3)
java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException:
java.lang.String is not a valid external type for schema of int
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class …Run Code Online (Sandbox Code Playgroud)