如何在Spark 2中启用钨优化？

Question

如何在Spark 2中启用钨优化？

Har*_*oed 5 apache-spark apache-spark-sql pyspark apache-spark-2.0

我刚刚在hive支持下构建了Spark 2，并使用Hortonworks 2.3.4将其部署到集群中。但是我发现此Spark 2.0.3比HDP 2.3随附的标准Spark 1.5.3慢

当我检查时explain，似乎我的Spark 2.0.3没有使用钨。我需要创建特殊版本来启用钨吗？

Spark 1.5.3解释

== Physical Plan ==
TungstenAggregate(key=[id#2], functions=[], output=[id#2])
TungstenExchange hashpartitioning(id#2)
TungstenAggregate(key=[id#2], functions=[], output=[id#2])
HiveTableScan [id#2], (MetastoreRelation default, testing, None)

Run Code Online (Sandbox Code Playgroud)

火花2.0.3

== Physical Plan ==
*HashAggregate(keys=[id#2481], functions=[])
  +- Exchange hashpartitioning(id#2481, 72)
  +- *HashAggregate(keys=[id#2481], functions=[])
  +- HiveTableScan [id#2481], MetastoreRelation default, testing

Run Code Online (Sandbox Code Playgroud)

Answer 1

T. *_*ęda 5

它仍然使用钨，类被重命名：https : //github.com/apache/spark/commit/8900c8d8ff1614b5ec5a2ce213832fa13462b4d4

Answer 2

Dav*_*ler 1

我认为它是默认启用的，但你可以设置spark.sql.tungsten.enabled=true

归档时间：	8 年，10 月前
查看次数：	2001 次
最近记录：	8 年，10 月前