Eug*_*kin 4 hive apache-spark apache-spark-sql hive-metastore hdp
我们将HDP群集升级到3.1.1.3.0.1.0-187,并发现:
实际上我们看到:
org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database ... not found
Run Code Online (Sandbox Code Playgroud)
您能帮助我了解发生了什么以及如何解决吗?
更新:
组态:
(spark.sql.warehouse.dir,/ warehouse / tablespace / external / hive /)(spark.admin.acls,)(spark.yarn.dist.files,file:///opt/folder/config.yml,file :///opt/jdk1.8.0_172/jre/lib/security/cacerts)(spark.history.kerberos.keytab,/ etc / security / keytabs / spark.service.keytab)(spark.io.compression.lz4。 blockSize,128kb)(spark.executor.extraJavaOptions,-Djavax.net.ssl.trustStore = cacerts)(spark.history.fs.logDirectory,hdfs:/// spark2-history /)(spark.io.encryption.keygen。算法,HmacSHA1)(spark.sql.autoBroadcastJoinThreshold,26214400)(spark.eventLog.enabled,true)(spark.shuffle.service.enabled,true)(spark.driver.extraLibraryPath,/ usr / hdp / current / hadoop-client / lib / native:/ usr / hdp / current / hadoop-client / lib / native / Linux-amd64-64)(spark.ssl.keyStore,/ etc / security / serverKeys / server-keystore.jks)(spark.yarn .queue,默认)(spark.jars,文件:/opt/folder/component-assembly-0.1.0-SNAPSHOT.jar)(spark.ssl.enabled,true)(spark.sql.orc.filterPushdown,true)(spark.shuffle.unsafe.file.output.buffer, 5m)(spark.yarn.historyServer.address,master2.env.project:18481)(spark.ssl.trustStore,/ etc / security / clientKeys / all.jks)(spark.app.name,com.company.env。 component.MyClass)(spark.sql.hive.metastore.jars,/ usr / hdp / current / spark2-client / standalone-metastore / *)(spark.io.encryption.keySizeBits,128)(spark.driver.memory, 2g)(spark.executor.instances,10)(spark.history.kerberos.principal,spark / edge.env.project @ ENV.PROJECT)(spark.unsafe.sorter.spill.reader.buffer.size,1m)( spark.ssl.keyPassword,*********(已编辑))(spark.ssl.keyStorePassword,*********(已编辑))(spark.history.fs.cleaner.enabled, true)(spark.shuffle.io.serverThreads,128)(spark.sql.hive.convertMetastoreOrc,true)(spark.submit.deployMode,client)(spark.sql.orc.char.enabled,true)(spark.master,yarn)(spark.authenticate.enableSaslEncryption,true)(spark.history.fs.cleaner.interval ,7d)(spark.authenticate,true)(spark.history.fs.cleaner.maxAge,90d)(spark.history.ui.acls.enable,true)(spark.acls.enable,true)(spark.history。 provider,org.apache.spark.deploy.history.FsHistoryProvider)(spark.executor.extraLibraryPath,/ usr / hdp / current / hadoop-client / lib / native:/ usr / hdp / current / hadoop-client / lib / native / Linux-amd64-64)(spark.executor.memory,2g)(spark.io.encryption.enabled,true)(spark.shuffle.file.buffer,1m)(spark.eventLog.dir,hdfs:/// spark2-history /)(spark.ssl.protocol,TLS)(spark.dynamicAllocation.enabled,true)(spark.executor.cores,3)(spark.history.ui.port,18081)(spark.sql.statistics。 fallBackToHdfs,true)(spark.repl.local。jars,file:///opt/folder/postgresql-42.2.2.jar,file:///opt/folder/ojdbc6.jar)(spark.ssl.trustStorePassword,*********(已编辑))(spark.history.ui.admin.acls,)(spark.history.kerberos.enabled,true)(spark.shuffle.io.backLog,8192)(spark.sql.orc.impl,native)(spark.sql.orc.impl,native) ssl.enabledAlgorithms,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA)(spark.sql.orc.enabled,true)(spark.yarn.dist.jars,file:///opt/folder/postgresql-42.2.2.jar,file:/// opt / folder / ojdbc6.jar)(spark.sql.hive.metastore.version,3.0)TLS_RSA_WITH_AES_256_CBC_SHA)(spark.sql.orc.enabled,true)(spark.yarn.dist.jars,file:///opt/folder/postgresql-42.2.2.jar,file:/// opt / folder / ojdbc6。 jar)(spark.sql.hive.metastore.version,3.0)TLS_RSA_WITH_AES_256_CBC_SHA)(spark.sql.orc.enabled,true)(spark.yarn.dist.jars,file:///opt/folder/postgresql-42.2.2.jar,file:/// opt / folder / ojdbc6。 jar)(spark.sql.hive.metastore.version,3.0)
从hive-site.xml中:
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/warehouse/tablespace/managed/hive</value>
</property>
Run Code Online (Sandbox Code Playgroud)
代码如下:
val spark = SparkSession
.builder()
.appName(getClass.getSimpleName)
.enableHiveSupport()
.getOrCreate()
...
dataFrame.write
.format("orc")
.options(Map("spark.sql.hive.convertMetastoreOrc" -> true.toString))
.mode(SaveMode.Append)
.saveAsTable("name")
Run Code Online (Sandbox Code Playgroud)
提交火花:
--master yarn \
--deploy-mode client \
--driver-memory 2g \
--driver-cores 4 \
--executor-memory 2g \
--num-executors 10 \
--executor-cores 3 \
--conf "spark.dynamicAllocation.enabled=true" \
--conf "spark.shuffle.service.enabled=true" \
--conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=cacerts" \
--conf "spark.sql.warehouse.dir=/warehouse/tablespace/external/hive/" \
--jars postgresql-42.2.2.jar,ojdbc6.jar \
--files config.yml,/opt/jdk1.8.0_172/jre/lib/security/cacerts \
--verbose \
component-assembly-0.1.0-SNAPSHOT.jar \
Run Code Online (Sandbox Code Playgroud)
看起来这是一个未实现的Spark 功能。但是,我发现从3.0开始使用Spark和Hive的唯一方法是使用Horton的HiveWarehouseConnector。文档在这里。而从霍顿社区很好的指导意义在这里。在Spark开发人员准备自己的解决方案之前,我没有回答这个问题。
归档时间: |
|
查看次数: |
2798 次 |
最近记录: |