小编use*_*848的帖子

Spark SQL广播散列连接

我正在尝试使用SparkSQL在数据帧上执行广播散列连接,如下所示:https://docs.cloud.databricks.com/docs/latest/databricks_guide/06%20Spark%20SQL%20%26%20DataFrames/05% 20BroadcastHashJoin%20-%20scala.html

在该示例中,(small)DataFrame通过saveAsTable持久化,然后通过spark SQL(即via)进行连接sqlContext.sql("..."))

我遇到的问题是我需要使用sparkSQL API来构造我的SQL(我还要加入~50个带有ID列表的表,并且不想手工编写SQL).

How do I tell spark to use the broadcast hash join via the API?  The issue is that if I load the ID list (from the table persisted via `saveAsTable`) into a `DataFrame` to use in the join, it isn't clear to me if Spark can apply the broadcast hash join.
Run Code Online (Sandbox Code Playgroud)

apache-spark apache-spark-sql

8
推荐指数
2
解决办法
1万
查看次数

标签 统计

apache-spark ×1

apache-spark-sql ×1