如何在Spark 2中启用钨优化?

Har*_*oed 5 apache-spark apache-spark-sql pyspark apache-spark-2.0

我刚刚在hive支持下构建了Spark 2,并使用Hortonworks 2.3.4将其部署到集群中。但是我发现此Spark 2.0.3比HDP 2.3随附的标准Spark 1.5.3慢

当我检查时explain,似乎我的Spark 2.0.3没有使用钨。我需要创建特殊版本来启用钨吗?

Spark 1.5.3解释

== Physical Plan ==
TungstenAggregate(key=[id#2], functions=[], output=[id#2])
TungstenExchange hashpartitioning(id#2)
TungstenAggregate(key=[id#2], functions=[], output=[id#2])
HiveTableScan [id#2], (MetastoreRelation default, testing, None)
Run Code Online (Sandbox Code Playgroud)

火花2.0.3

== Physical Plan ==
*HashAggregate(keys=[id#2481], functions=[])
  +- Exchange hashpartitioning(id#2481, 72)
  +- *HashAggregate(keys=[id#2481], functions=[])
  +- HiveTableScan [id#2481], MetastoreRelation default, testing
Run Code Online (Sandbox Code Playgroud)

Dav*_*ler 1

我认为它是默认启用的,但你可以设置spark.sql.tungsten.enabled=true