Java - Spark SQL DataFrame映射函数不起作用

use*_*330 5 java sql apache-spark map-function

在Spark SQL中,当我尝试在DataFrame上使用map函数时,我遇到了错误.

DataFrame类型中的方法映射(Function1,ClassTag)不适用于参数(new Function(){})

我也在关注spark 1.3文档.https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection有任何解决方案吗?

这是我的测试代码.

   // SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");

List<String> teenagerNames = teenagers.map(
            new Function<Row, String>() {
          public String call(Row row) {
            return "Name: " + row.getString(0);
          }
        }).collect();
Run Code Online (Sandbox Code Playgroud)

小智 12

将其更改为:

Java 6&7

List<String> teenagerNames = teenagers.javaRDD().map(
    new Function<Row, String>() {
    public String call(Row row) {
        return "Name: " + row.getString(0);
    }
}).collect();
Run Code Online (Sandbox Code Playgroud)

Java 8

List<String> t2 = teenagers.javaRDD().map(
    row -> "Name: " + row.getString(0)
).collect();
Run Code Online (Sandbox Code Playgroud)

一旦调用javaRDD(),它就像任何其他RDD映射函数一样工作.

这适用于Spark 1.3.0及更高版本.


uru*_*rug 0

您的 pom.xml 中是否设置了正确的依赖项?设置这个并尝试

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
Run Code Online (Sandbox Code Playgroud)