Spark.sql.hive.filesourcePartitionFileCacheSize

Question

Spark.sql.hive.filesourcePartitionFileCacheSize

只是想知道是否有人知道这个警告信息

18/01/10 19:52:56 WARN SharedInMemoryCache: Evicting cached table partition metadata from memory due to size constraints
(spark.sql.hive.filesourcePartitionFileCacheSize = 262144000 bytes). This may impact query planning performance

Run Code Online (Sandbox Code Playgroud)

当我尝试将一些带有许多分区的大数据帧从 S3 加载到 Spark 时，我经常看到这种情况。

它永远不会真正对工作造成任何问题，只是想知道该配置属性有什么用以及如何正确调整它。

谢谢

Answer 1

Gou*_*tta 5

回答您的问题，这是一个 Spark-Hive 特定的配置属性，当非零时，启用在内存中缓存分区文件元数据。所有表共享一个缓存，该缓存最多可以使用指定的文件元数据字节数。此conf仅在启用hive文件源分区管理时才有效。

Spark源码中是这样写的。根据代码，默认大小为 250 * 1024 * 1024，您可以尝试通过代码/spark-submit 命令中的 SparkConf 对象进行操作。

Spark源代码

val HIVE_FILESOURCE_PARTITION_FILE_CACHE_SIZE =
    buildConf("spark.sql.hive.filesourcePartitionFileCacheSize")
      .doc("When nonzero, enable caching of partition file metadata in memory. All tables share " +
           "a cache that can use up to specified num bytes for file metadata. This conf only " +
           "has an effect when hive filesource partition management is enabled.")
      .longConf
      .createWithDefault(250 * 1024 * 1024)

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，8 月前
查看次数：	14236 次
最近记录：	6 年，7 月前