Flink 与 Iceberg Catalog 和 Hive Metastore:找不到 org.apache.hadoop.fs.s3a.S3AFileSystem

Rob*_*att 5 apache-flink hive-metastore flink-sql apache-iceberg

我正在尝试使用 Apache Iceberg 目录和 Hive Metastore 设置 Flink SQL,但没有成功。以下是我在干净的 Flink 1.18.1 安装中采取的步骤,以及我得到的错误。

\n

设置组件

\n

运行 Hive MetaStore:

\n
docker run --rm --detach --name hms-standalone \\\n           --publish 9083:9083 \\\n           ghcr.io/recap-build/hive-metastore-standalone:latest \n
Run Code Online (Sandbox Code Playgroud)\n

使用 Docker 运行 MinIO:

\n
docker run --rm --detach --name minio \\\n            -p 9001:9001 -p 9000:9000 \\\n            -e "MINIO_ROOT_USER=admin" \\\n            -e "MINIO_ROOT_PASSWORD=password" \\\n            minio/minio server /data --console-address ":9001"\n
Run Code Online (Sandbox Code Playgroud)\n

配置一个存储桶:

\n
docker exec minio \\\n    mc config host add minio http://localhost:9000 admin password\ndocker exec minio \\\n    mc mb minio/warehouse\n
Run Code Online (Sandbox Code Playgroud)\n

将所需的 MinIO 配置添加到./conf/flink-conf.yaml

\n
cat >> ./conf/flink-conf.yaml <<EOF\nfs.s3a.access.key: admin\nfs.s3a.secret.key: password\nfs.s3a.endpoint: http://localhost:9000\nfs.s3a.path.style.access: true\nEOF\n
Run Code Online (Sandbox Code Playgroud)\n

将 JAR 添加到 Flink

\n

Flink 的 S3 插件:

\n
mkdir ./plugins/s3-fs-hadoop\ncp ./opt/flink-s3-fs-hadoop-1.18.1.jar ./plugins/s3-fs-hadoop/\n
Run Code Online (Sandbox Code Playgroud)\n

Flink 的 Hive 连接器:

\n
mkdir -p ./lib/hive\ncurl -s https://repo1.maven.org/maven2/org/apache/flink/flink-sql-connector-hive-3.1.3_2.12/1.18.1/flink-sql-connector-hive-3.1.3_2.12-1.18.1.jar -o ./lib/hive/flink-sql-connector-hive-3.1.3_2.12-1.18.1.jar\n
Run Code Online (Sandbox Code Playgroud)\n

Iceberg 的依赖项:

\n
mkdir ./lib/iceberg\ncurl https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.17/1.4.3/iceberg-flink-runtime-1.17-1.4.3.jar -o ./lib/iceberg/iceberg-flink-runtime-1.17-1.4.3.jar\n\nmkdir -p ./lib/aws\ncurl https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.6/hadoop-aws-3.3.6.jar -o ./lib/aws/hadoop-aws-3.3.6.jar\n
Run Code Online (Sandbox Code Playgroud)\n

运行

\n

设置 Hadoop 依赖项:

\n
export HADOOP_CLASSPATH=$(~/hadoop/hadoop-3.3.2/bin/hadoop classpath)\n
Run Code Online (Sandbox Code Playgroud)\n

启动 SQL 客户端:

\n
./bin/sql-client.sh\n
Run Code Online (Sandbox Code Playgroud)\n
Flink SQL> CREATE CATALOG c_iceberg_hive WITH (\n>    \'type\' = \'iceberg\',\n>    \'client.assume-role.region\' = \'us-east-1\',\n>    \'warehouse\' = \'s3a://warehouse\',\n>    \'s3.endpoint\' = \'http://localhost:9000\',\n>    \'s3.path-style-access\' = \'true\',\n>    \'catalog-type\'=\'hive\',\n>    \'uri\'=\'thrift://localhost:9083\');\n[INFO] Execute statement succeed.\n\nFlink SQL> USE CATALOG c_iceberg_hive;\n[INFO] Execute statement succeed.\n\nFlink SQL> CREATE DATABASE db_rmoff;\n[ERROR] Could not execute SQL statement. Reason:\nMetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found)\n\nFlink SQL>\n
Run Code Online (Sandbox Code Playgroud)\n

完整的堆栈跟踪

\n
Caused by: org.apache.flink.table.gateway.service.utils.SqlExecutionException: Failed to execute the operation b685c995-3280-4a9e-b6c0-18ab9369d790.                                       \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationManager$Operation.processThrowable(OperationManager.java:414)                                                          \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationManager$Operation.lambda$run$0(OperationManager.java:267)                                                              \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)                                                                                                \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)                                                                                                               \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)                                                                                                \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)                                                                                                               \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)                                                                                        \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)                                                                                        \xe2\x94\x82\n\xe2\x94\x82       ... 1 more                                                                                                                                                                          \xe2\x94\x82\n\xe2\x94\x82Caused by: org.apache.flink.table.api.TableException: Could not execute CREATE DATABASE: (catalogDatabase: [{}], catalogName: [c_iceberg_hive], databaseName: [db_rmoff], ignoreIfExists: [\xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.operations.ddl.CreateDatabaseOperation.execute(CreateDatabaseOperation.java:90)                                                                           \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1092)                                                                         \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationExecutor.callOperation(OperationExecutor.java:556)                                                                     \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationExecutor.executeOperation(OperationExecutor.java:444)                                                                  \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationExecutor.executeStatement(OperationExecutor.java:207)                                                                  \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.SqlGatewayServiceImpl.lambda$executeStatement$1(SqlGatewayServiceImpl.java:212)                                                           \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationManager.lambda$submitOperation$1(OperationManager.java:119)                                                            \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationManager$Operation.lambda$run$0(OperationManager.java:258)                                                              \xe2\x94\x82\n\xe2\x94\x82       ... 7 more                                                                                                                                                                          \xe2\x94\x82\n\xe2\x94\x82Caused by: java.lang.RuntimeException: Failed to create namespace db_rmoff in Hive Metastore                                                                                               \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.hive.HiveCatalog.createNamespace(HiveCatalog.java:294)                                                                                                        \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.flink.FlinkCatalog.createDatabase(FlinkCatalog.java:222)                                                                                                      \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.flink.FlinkCatalog.createDatabase(FlinkCatalog.java:213)                                                                                                      \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.catalog.CatalogManager.createDatabase(CatalogManager.java:1381)                                                                                           \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.operations.ddl.CreateDatabaseOperation.execute(CreateDatabaseOperation.java:84)                                                                           \xe2\x94\x82\n\xe2\x94\x82       ... 14 more                                                                                                                                                                         \xe2\x94\x82\n\xe2\x94\x82Caused by: MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found)                                     \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:39343)                        \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:39311)                        \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result.read(ThriftHiveMetastore.java:39245)                                                             \xe2\x94\x82\n\xe2\x94\x82       at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)                                                                                                             \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_database(ThriftHiveMetastore.java:1106)                                                              \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_database(ThriftHiveMetastore.java:1093)                                                                   \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:811)                                                                                \xe2\x94\x82\n\xe2\x94\x82       at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                                                   \xe2\x94\x82\n\xe2\x94\x82       at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)                                                                                 \xe2\x94\x82\n\xe2\x94\x82       at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)                                                                         \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.lang.reflect.Method.invoke(Method.java:566)                                                                                                                       \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:208)                                                                                \xe2\x94\x82\n\xe2\x94\x82       at com.sun.proxy.$Proxy35.createDatabase(Unknown Source)                                                                                                                            \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.hive.HiveCatalog.lambda$createNamespace$8(HiveCatalog.java:283)                                                                                               \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:58)                                                                                                                    \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)                                                                                                                    \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)                                                                                                          \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.hive.HiveCatalog.createNamespace(HiveCatalog.java:281)                                                                                                        \xe2\x94\x82\n\xe2\x94\x82       ... 18 more\n
Run Code Online (Sandbox Code Playgroud)\n

诊断

\n

验证其hadoop-aws位于类路径上:

\n
Caused by: org.apache.flink.table.gateway.service.utils.SqlExecutionException: Failed to execute the operation b685c995-3280-4a9e-b6c0-18ab9369d790.                                       \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationManager$Operation.processThrowable(OperationManager.java:414)                                                          \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationManager$Operation.lambda$run$0(OperationManager.java:267)                                                              \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)                                                                                                \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)                                                                                                               \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)                                                                                                \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)                                                                                                               \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)                                                                                        \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)                                                                                        \xe2\x94\x82\n\xe2\x94\x82       ... 1 more                                                                                                                                                                          \xe2\x94\x82\n\xe2\x94\x82Caused by: org.apache.flink.table.api.TableException: Could not execute CREATE DATABASE: (catalogDatabase: [{}], catalogName: [c_iceberg_hive], databaseName: [db_rmoff], ignoreIfExists: [\xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.operations.ddl.CreateDatabaseOperation.execute(CreateDatabaseOperation.java:90)                                                                           \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1092)                                                                         \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationExecutor.callOperation(OperationExecutor.java:556)                                                                     \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationExecutor.executeOperation(OperationExecutor.java:444)                                                                  \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationExecutor.executeStatement(OperationExecutor.java:207)                                                                  \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.SqlGatewayServiceImpl.lambda$executeStatement$1(SqlGatewayServiceImpl.java:212)                                                           \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationManager.lambda$submitOperation$1(OperationManager.java:119)                                                            \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.gateway.service.operation.OperationManager$Operation.lambda$run$0(OperationManager.java:258)                                                              \xe2\x94\x82\n\xe2\x94\x82       ... 7 more                                                                                                                                                                          \xe2\x94\x82\n\xe2\x94\x82Caused by: java.lang.RuntimeException: Failed to create namespace db_rmoff in Hive Metastore                                                                                               \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.hive.HiveCatalog.createNamespace(HiveCatalog.java:294)                                                                                                        \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.flink.FlinkCatalog.createDatabase(FlinkCatalog.java:222)                                                                                                      \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.flink.FlinkCatalog.createDatabase(FlinkCatalog.java:213)                                                                                                      \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.catalog.CatalogManager.createDatabase(CatalogManager.java:1381)                                                                                           \xe2\x94\x82\n\xe2\x94\x82       at org.apache.flink.table.operations.ddl.CreateDatabaseOperation.execute(CreateDatabaseOperation.java:84)                                                                           \xe2\x94\x82\n\xe2\x94\x82       ... 14 more                                                                                                                                                                         \xe2\x94\x82\n\xe2\x94\x82Caused by: MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found)                                     \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:39343)                        \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:39311)                        \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result.read(ThriftHiveMetastore.java:39245)                                                             \xe2\x94\x82\n\xe2\x94\x82       at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)                                                                                                             \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_database(ThriftHiveMetastore.java:1106)                                                              \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_database(ThriftHiveMetastore.java:1093)                                                                   \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:811)                                                                                \xe2\x94\x82\n\xe2\x94\x82       at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                                                   \xe2\x94\x82\n\xe2\x94\x82       at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)                                                                                 \xe2\x94\x82\n\xe2\x94\x82       at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)                                                                         \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.lang.reflect.Method.invoke(Method.java:566)                                                                                                                       \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:208)                                                                                \xe2\x94\x82\n\xe2\x94\x82       at com.sun.proxy.$Proxy35.createDatabase(Unknown Source)                                                                                                                            \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.hive.HiveCatalog.lambda$createNamespace$8(HiveCatalog.java:283)                                                                                               \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:58)                                                                                                                    \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)                                                                                                                    \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)                                                                                                          \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.hive.HiveCatalog.createNamespace(HiveCatalog.java:281)                                                                                                        \xe2\x94\x82\n\xe2\x94\x82       ... 18 more\n
Run Code Online (Sandbox Code Playgroud)\n

确认 JAR 包含 S3AFileSystem 类:

\n
\xe2\x9d\xaf ps -ef|grep sql-client|grep hadoop-aws\n  501 51499 45632   0  7:38pm ttys007    0:06.81 /Users/rmoff/.sdkman/candidates/java/current/bin/java -XX:+IgnoreUnrecognizedVMOptions --add-exports=java.base/sun.net.util=ALL-UNNAMED --ad\nd-exports=java.rmi/sun.rmi.registry=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports=\njdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-exp\norts=java.security.jgss/sun.security.krb5=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-\nopens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.text=ALL-UNNAMED --add-opens\n=java.base/java.time=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNN\nAMED --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED -Dlog.file=/Users/rmoff/flink/flink-1.18.1/log/flink-rmoff-sql-client-asgard08.log -Dlog4j.configuration=file:/Users/rmoff/\nflink/flink-1.18.1/conf/log4j-cli.properties -Dlog4j.configurationFile=file:/Users/rmoff/flink/flink-1.18.1/conf/log4j-cli.properties -Dlogback.configurationFile=file:/Users/rmoff/flink/fli\nnk-1.18.1/conf/logback.xml -classpath /Users/rmoff/flink/flink-1.18.1/lib/aws/hadoop-aws-3.3.6.jar:/Users/rmoff/flink/flink-1.18.1/lib/flink-cep-1.18.1.jar:/Users/rmoff/flink/flink-1.18.1/l[\xe2\x80\xa6]\n
Run Code Online (Sandbox Code Playgroud)\n

CREATE CATALOG如果我也将背面剥离到只剩下骨头,我会得到同样的错误:

\n
\xe2\x9d\xaf jar tvf lib/aws/hadoop-aws-3.3.6.jar|grep -i filesystem.class\n157923 Sun Jun 18 08:56:00 BST 2023 org/apache/hadoop/fs/s3a/S3AFileSystem.class\n  3821 Sun Jun 18 08:56:02 BST 2023 org/apache/hadoop/fs/s3native/NativeS3FileSystem.class\n
Run Code Online (Sandbox Code Playgroud)\n

Java版本:

\n
Flink SQL> CREATE CATALOG c_iceberg_hive2 WITH (\n>    \'type\' = \'iceberg\',\n>    \'warehouse\' = \'s3a://warehouse\',\n>    \'catalog-type\'=\'hive\',\n>    \'uri\'=\'thrift://localhost:9083\');\n[INFO] Execute statement succeed.\n\nFlink SQL> USE CATALOG c_iceberg_hive2;\n[INFO] Execute statement succeed.\n\nFlink SQL> CREATE DATABASE db_rmoff;\n[ERROR] Could not execute SQL statement. Reason:\nMetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found)\n
Run Code Online (Sandbox Code Playgroud)\n
\n

编辑01

\n

我尝试过的其他事情:

\n
    \n
  1. 使用 Flink 1.17.1(与 Iceberg jar 中的 1.17 版本保持一致)
  2. \n
  3. 全程使用 Hadoop 3.3.4 组件
  4. \n
  5. 将 jar 移入./lib而不是子文件夹中
  6. \n
  7. 删除 Flinks3-fs-hadoop插件
  8. \n
  9. 添加iceberg-aws-bundle-1.4.3.jaraws-java-sdk-bundle-1.12.648.jar(分别和一起)
  10. \n
  11. 使用相同的设置以 parquet 格式写入 S3 (MinIO),效果很好。
  12. \n
\n

更多诊断:

\n

如果我将三个 SQL 语句 ( CREATE CATALOG/ USE CATALOG/ CREATE DATABASE) 添加到文件中并启动带有详细类日志记录的 SQL 客户端:

\n
\xe2\x9d\xaf java --version\nopenjdk 11.0.21 2023-10-17\nOpenJDK Runtime Environment Temurin-11.0.21+9 (build 11.0.21+9)\nOpenJDK 64-Bit Server VM Temurin-11.0.21+9 (build 11.0.21+9, mixed mode)\n
Run Code Online (Sandbox Code Playgroud)\n

我得到这个输出,表明hadoop-awsJAR 没有被拾取,即使它位于类路径中。

\n

如果我将 Flink 添加s3-fs-hadoop回来,我们可以看到它被拾取(日志),但仍然遇到相同的失败。

\n
\n

编辑02

\n

如果我从 切换s3as3我会收到不同的错误\xc2\xaf\\_(\xe3\x83\x84)_/\xc2\xaf

\n
JVM_ARGS=-verbose:class ./bin/sql-client.sh -f ../iceberg.sql > iceberg.log\n
Run Code Online (Sandbox Code Playgroud)\n

如果我添加,io-impl我会得到另一个不同的错误,这似乎(以我有限的理解)再次表明hadoop-awsJAR 没有被拾取

\n
Flink SQL> CREATE CATALOG c_iceberg_hive WITH (\n>     \'type\' = \'iceberg\',\n>     \'client.assume-role.region\' = \'us-east-1\',\n>     \'warehouse\' = \'s3://warehouse\',\n>     \'s3.endpoint\' = \'http://localhost:9000\',\n>     \'s3.path-style-access\' = \'true\',\n>     \'catalog-type\'=\'hive\',\n>     \'uri\'=\'thrift://localhost:9083\');\n[INFO] Execute statement succeed.\n\nFlink SQL> USE CATALOG c_iceberg_hive;\n[INFO] Execute statement succeed.\n\nFlink SQL> CREATE DATABASE db_rmoff;\n[ERROR] Could not execute SQL statement. Reason:\nMetaException(message:Got exception: org.apache.hadoop.fs.UnsupportedFileSystemException No FileSystem for scheme "s3")\n
Run Code Online (Sandbox Code Playgroud)\n

Ale*_*nko 2

您观察到的错误源自 Hive Metastore 服务器,而不是 Flink:

\n
\xe2\x94\x82Caused by: MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found)                                     \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:39343)                        \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:39311)                        \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result.read(ThriftHiveMetastore.java:39245)                                                             \xe2\x94\x82\n\xe2\x94\x82       at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)                                                                                                             \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_database(ThriftHiveMetastore.java:1106)                                                              \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_database(ThriftHiveMetastore.java:1093)                                                                   \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:811)                                                                                \xe2\x94\x82\n\xe2\x94\x82       at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                                                   \xe2\x94\x82\n\xe2\x94\x82       at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)                                                                                 \xe2\x94\x82\n\xe2\x94\x82       at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)                                                                         \xe2\x94\x82\n\xe2\x94\x82       at java.base/java.lang.reflect.Method.invoke(Method.java:566)                                                                                                                       \xe2\x94\x82\n\xe2\x94\x82       at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:208)                                                                                \xe2\x94\x82\n\xe2\x94\x82       at com.sun.proxy.$Proxy35.createDatabase(Unknown Source)                                                                                                                            \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.hive.HiveCatalog.lambda$createNamespace$8(HiveCatalog.java:283)                                                                                               \xe2\x94\x82\n\xe2\x94\x82       at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:58)                                                        \n
Run Code Online (Sandbox Code Playgroud)\n

这表明此错误是从 Hive Thrift API 收到的。

\n

此处用于运行 Hive 的 Docker 映像不包括hadoop-aws- 您需要自己添加它,或者使用另一个包含所需依赖项的 Hive 镜像。

\n


归档时间:

查看次数:

314 次

最近记录:

1 年,8 月 前