Don*_*hew 4 java apache-spark apache-spark-sql
请看下面的代码:
//Create Spark Context
SparkConf sparkConf = new SparkConf().setAppName("TestWithObjects").setMaster("local");
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
//Creating RDD
JavaRDD<Person> personsRDD = javaSparkContext.parallelize(persons);
//Creating SQL context
SQLContext sQLContext = new SQLContext(javaSparkContext);
DataFrame personDataFrame = sQLContext.createDataFrame(personsRDD, Person.class);
personDataFrame.show();
personDataFrame.printSchema();
personDataFrame.select("name").show();
personDataFrame.registerTempTable("peoples");
DataFrame result = sQLContext.sql("SELECT * FROM peoples WHERE name='test'");
result.show();
Run Code Online (Sandbox Code Playgroud)
在此之后,我需要将DataFrame - 'result'转换为Person Object或List.提前致谢.
小智 9
DataFrame只是Dataset [Row]的类型别名.与强类型Scala/Java数据集一起提供的"类型转换"相比,这些操作也称为"无类型转换".
从数据集[Row]到Dataset [Person]的转换在spark中非常简单
DataFrame result = sQLContext.sql("SELECT * FROM peoples WHERE name='test'");
此时,Spark将您的数据转换为DataFrame = Dataset [Row],这是一个通用Row对象的集合,因为它不知道确切的类型.
// Create an Encoders for Java beans
Encoder<Person> personEncoder = Encoders.bean(Person.class);
Dataset<Person> personDF = result.as(personEncoder);
personDF.show();
Run Code Online (Sandbox Code Playgroud)
现在,Spark转换数据集[Row] - > Dataset [Person]类型特定的Scala/Java JVM对象,由Person类指定.
有关详细信息,请参阅databricks提供的以下链接