小编Sch*_*äbo的帖子

Spark数据帧连接范围慢

我有一个火花作业的以下输入数据(在Parquet中):

Person (millions of rows)
+---------+----------+---------------+---------------+
|  name   | location |     start     |      end      |
+---------+----------+---------------+---------------+
| Person1 |     1230 | 1478630000001 | 1478630000010 |
| Person2 |     1230 | 1478630000002 | 1478630000012 |
| Person2 |     1230 | 1478630000013 | 1478630000020 |
| Person3 |     3450 | 1478630000001 | 1478630000015 |
+---------+----------+---------------+---------------+


Event (millions of rows)
+----------+----------+---------------+
|  event   | location |  start_time   |
+----------+----------+---------------+
| Biking   |     1230 | 1478630000005 |
| Skating  |     1230 | 1478630000014 | …
Run Code Online (Sandbox Code Playgroud)

java apache-spark apache-spark-sql spark-dataframe

5
推荐指数
1
解决办法
878
查看次数