使用Spark SQL时找不到Spark Logging类

kha*_*eeb 5 java maven apache-spark

我试图用Java做一个简单的Spark SQL编程.在程序中,我从Cassandra表中获取数据,将其转换RDD为a Dataset并显示数据.当我运行spark-submit命令时,我收到错误:java.lang.ClassNotFoundException: org.apache.spark.internal.Logging.

我的计划是:

SparkConf sparkConf = new SparkConf().setAppName("DataFrameTest")
        .set("spark.cassandra.connection.host", "abc")
        .set("spark.cassandra.auth.username", "def")
        .set("spark.cassandra.auth.password", "ghi");
SparkContext sparkContext = new SparkContext(sparkConf);
JavaRDD<EventLog> logsRDD = javaFunctions(sparkContext).cassandraTable("test", "log",
        mapRowTo(Log.class));
SparkSession sparkSession = SparkSession.builder().appName("Java Spark SQL").getOrCreate();
Dataset<Row> logsDF = sparkSession.createDataFrame(logsRDD, Log.class);
logsDF.show();
Run Code Online (Sandbox Code Playgroud)

我的POM依赖项是:

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.0.2</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>2.0.2</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>com.datastax.spark</groupId>
        <artifactId>spark-cassandra-connector_2.11</artifactId>
        <version>1.6.3</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.0.2</version>
    </dependency>   
</dependencies>
Run Code Online (Sandbox Code Playgroud)

我的spark-submit命令是:/home/ubuntu/spark-2.0.2-bin-hadoop2.7/bin/spark-submit --class "com.jtv.spark.dataframes.App" --master local[4] spark.dataframes-0.1-jar-with-dependencies.jar

我该如何解决这个错误?降级到1.5.2如不工作1.5.2没有org.apache.spark.sql.Datasetorg.apache.spark.sql.SparkSession.

Sac*_*wgi 0

Spark Logging 适用于 Spark 版本1.5.2及较低版本,但不适用于较高版本。所以你的依赖pom.xml应该是这样的:

<dependencies>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.5.2</version>
    <scope>provided</scope>
  </dependency>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.10</artifactId>
    <version>1.5.2</version>
    <scope>provided</scope>
  </dependency>
  <dependency>
    <groupId>com.datastax.spark</groupId>
    <artifactId>spark-cassandra-connector_2.10</artifactId>
    <version>1.5.2</version>
  </dependency>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.5.2</version>
  </dependency>   
</dependencies>
Run Code Online (Sandbox Code Playgroud)

请告诉我它是否有效。