禁用火花催化剂优化器

aja*_*ore 5 optimization query-optimization apache-spark apache-spark-sql spark-dataframe

为了提供一些背景知识,我尝试在有和没有Spark的催化剂优化程序的情况下在Spark上运行TPCDS基准测试。对于较小数据集上的复杂查询,我们可能比实际执行计划花费更多的时间来优化计划。因此,想要衡量优化器对查询整体执行的性能影响

有没有办法禁用某些或所有火花催化剂优化规则?

DaR*_*MaN 3

此功能已作为 Spark-2.4.0 的一部分添加到SPARK-24802中。

val OPTIMIZER_EXCLUDED_RULES = buildConf("spark.sql.optimizer.excludedRules")
    .doc("Configures a list of rules to be disabled in the optimizer, in which the rules are " +
      "specified by their rule names and separated by comma. It is not guaranteed that all the " +
      "rules in this configuration will eventually be excluded, as some rules are necessary " +
      "for correctness. The optimizer will log the rules that have indeed been excluded.")
    .stringConf
    .createOptional

Run Code Online (Sandbox Code Playgroud)

您可以在此处找到优化器规则列表。
但理想情况下,我们不应该禁用这些规则,因为它们中的大多数都提供了性能优势。我们应该识别消耗时间的规则并检查是否对查询没有用,然后禁用它们。