spark.table 失败并出现 java.io.Exception: No FileSystem for Scheme: abfs

Question

spark.table 失败并出现 java.io.Exception: No FileSystem for Scheme: abfs

ven*_*ata 6 apache-spark apache-spark-sql

我们有一个自定义文件系统类，它是 hadoop.fs.FileSystem 的扩展。该文件系统的 uri 方案为 abfs:///。已在此数据上创建了外部配置单元表。

CREATE EXTERNAL TABLE testingCustomFileSystem (a string, b int, c double) PARTITIONED BY dt
STORED AS PARQUET
LOCATION 'abfs://<host>:<port>/user/name/path/to/data/'

Run Code Online (Sandbox Code Playgroud)

使用 loginbeeline，我可以查询表并获取结果。

现在我正在尝试使用 spark.table('testingCustomFileSystem') 将同一个表加载到 spark 数据框中，它会抛出以下异常

    java.io.IOException: No FileSystem for scheme: abfs
  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
  at org.apache.spark.sql.execution.datasources.CatalogFileIndex$$anonfun$2.apply(CatalogFileIndex.scala:77)
  at org.apache.spark.sql.execution.datasources.CatalogFileIndex$$anonfun$2.apply(CatalogFileIndex.scala:75)
  at scala.collection.immutable.Stream.map(Stream.scala:418)

Run Code Online (Sandbox Code Playgroud)

包含 CustomFileSystem（定义 abfs:// 方案）的 jar 已加载到类路径中并且也可用。

spark.table 如何解析元存储中的 hive 表定义并解析 uri？。

Answer 1

ven*_*ata 1

在查看了 Spark 中的配置后，我偶然发现通过设置以下 hadoop 配置，我能够解决。

hadoopConfiguration.set("fs.abfs.impl",<fqcn of the FileSystemImplementation>)

Run Code Online (Sandbox Code Playgroud)

在 Spark 中，此设置是在创建 SparkSession 期间完成的（仅使用 appName 和

喜欢

val spark = SparkSession
            .builder()
            .setAppName("Name")
            .setMaster("yarn")
            .getOrCreate()

spark.sparkContext
     .hadoopConfiguration.set("fs.abfs.impl",<fqcn of the FileSystemImplementation>)

Run Code Online (Sandbox Code Playgroud)

它起作用了！

归档时间：	6 年，10 月前
查看次数：	876 次
最近记录：	6 年，7 月前