在 Python 中以编程方式启动 HiveThriftServer

Question

在 Python 中以编程方式启动 HiveThriftServer

Rav*_*nan 3 python hive scala thrift hivecontext

在 spark-shell (scala) 中，我们导入 org.apache.spark.sql.hive.thriftserver._ 以编程方式为特定的 hive 上下文启动 Hive Thrift 服务器作为 HiveThriftServer2.startWithContext(hiveContext) 以公开注册的临时表那个特定的会议。

我们如何使用 python 做同样的事情？python 上是否有用于导入 HiveThriftServer 的包/api？任何其他想法/建议表示赞赏。

我们已经使用 pyspark 创建了一个数据框

谢谢

拉维·纳拉亚南

Answer 1

Sas*_*han 5

您可以使用 py4j java gateway 导入它。以下代码适用于 spark 2.0.2，可以通过 beeline 查询在 python 脚本中注册的临时表。

from py4j.java_gateway import java_import
java_import(sc._gateway.jvm,"")

spark = SparkSession \
        .builder \
        .appName(app_name) \
        .master(master)\
        .enableHiveSupport()\
        .config('spark.sql.hive.thriftServer.singleSession', True)\
        .getOrCreate()
sc=spark.sparkContext
sc.setLogLevel('INFO')

#Start the Thrift Server using the jvm and passing the same spark session corresponding to pyspark session in the jvm side.
sc._gateway.jvm.org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.startWithContext(spark._jwrapped)

spark.sql('CREATE TABLE myTable')
data_file="path to csv file with data"
dataframe = spark.read.option("header","true").csv(data_file).cache()
dataframe.createOrReplaceTempView("myTempView")

Run Code Online (Sandbox Code Playgroud)

然后去直线检查它是否正确开始：

in terminal> $SPARK_HOME/bin/beeline
beeline> !connect jdbc:hive2://localhost:10000
beeline> show tables;

Run Code Online (Sandbox Code Playgroud)

它应该显示在 python 中创建的表和临时表/视图，包括上面的“myTable”和“myTempView”。必须有相同的火花会话才能看到临时视图

（请参阅答案：避免以编程方式使用创建的上下文启动 HiveThriftServer2。
注意：即使 Thrift 服务器从终端启动并连接到同一个元存储，也可以访问配置单元表，但是无法访问临时视图，因为它们在火花会话中并且未写入 Metastore）

归档时间：	9 年，8 月前
查看次数：	1280 次
最近记录：	9 年前