Kyl*_*ine 10 java kerberos apache-spark apache-spark-sql
我有一个简单的Java应用程序,可以使用Hive或Impala使用像这样的代码连接和查询我的集群
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
...
Class.forName("com.cloudera.hive.jdbc41.HS2Driver");
Connection con = DriverManager.getConnection("jdbc:hive2://myHostIP:10000/mySchemaName;hive.execution.engine=spark;AuthMech=1;KrbRealm=myHostIP;KrbHostFQDN=myHostIP;KrbServiceName=hive");
Statement stmt = con.createStatement();
ResultSet rs = stmt.executeQuery("select * from foobar");
Run Code Online (Sandbox Code Playgroud)
但现在我想尝试使用Spark SQL进行相同的查询.我很难搞清楚如何使用Spark SQL API.具体如何设置连接.我看到了如何设置Spark会话的示例,但是我不清楚我需要提供哪些值
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.config("spark.some.config.option", "some-value")
.getOrCreate();
Run Code Online (Sandbox Code Playgroud)
如何告诉Spark SQL使用什么主机和端口,使用什么架构,以及如何告诉Spark SQL我正在使用哪种身份验证技术?例如,我正在使用Kerberos进行身份验证.
上面的Spark SQL代码来自https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java
更新:
我能够取得一些进展,我想我想出了如何告诉Spark SQL连接使用什么主机和端口.
...
SparkSession spark = SparkSession
.builder()
.master("spark://myHostIP:10000")
.appName("Java Spark Hive Example")
.enableHiveSupport()
.getOrCreate();
Run Code Online (Sandbox Code Playgroud)
我在我的pom.xml文件中添加了以下依赖项
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.0.0</version>
</dependency>
Run Code Online (Sandbox Code Playgroud)
通过此更新,我可以看到连接正在进一步发展,但现在它似乎失败了,因为我没有通过身份验证.我需要弄清楚如何使用Kerberos进行身份验证.这是相关的日志数据
2017-12-19 11:17:55.717 INFO 11912 --- [o-auto-1-exec-1] org.apache.spark.util.Utils : Successfully started service 'SparkUI' on port 4040.
2017-12-19 11:17:55.717 INFO 11912 --- [o-auto-1-exec-1] org.apache.spark.ui.SparkUI : Bound SparkUI to 0.0.0.0, and started at http://myHostIP:4040
2017-12-19 11:17:56.065 INFO 11912 --- [er-threadpool-0] s.d.c.StandaloneAppClient$ClientEndpoint : Connecting to master spark://myHostIP:10000...
2017-12-19 11:17:56.260 INFO 11912 --- [pc-connection-0] o.a.s.n.client.TransportClientFactory : Successfully created connection to myHostIP:10000 after 113 ms (0 ms spent in bootstraps)
2017-12-19 11:17:56.354 WARN 11912 --- [huffle-client-0] o.a.s.n.server.TransportChannelHandler : Exception in connection from myHostIP:10000
java.io.IOException: An existing connection was forcibly closed by the remote host
Run Code Online (Sandbox Code Playgroud)