小编pra*_*ash的帖子

比较两个Spark数据帧

Spark数据帧1-:

+------+-------+---------+----+---+-------+
|city  |product|date     |sale|exp|wastage|
+------+-------+---------+----+---+-------+
|city 1|prod 1 |9/29/2017|358 |975|193    |
|city 1|prod 2 |8/25/2017|50  |687|201    |
|city 1|prod 3 |9/9/2017 |236 |431|169    |
|city 2|prod 1 |9/28/2017|358 |975|193    |
|city 2|prod 2 |8/24/2017|50  |687|201    |
|city 3|prod 3 |9/8/2017 |236 |431|169    |
+------+-------+---------+----+---+-------+
Run Code Online (Sandbox Code Playgroud)

Spark数据框2-:

+------+-------+---------+----+---+-------+
|city  |product|date     |sale|exp|wastage|
+------+-------+---------+----+---+-------+
|city 1|prod 1 |9/29/2017|358 |975|193    |
|city 1|prod 2 |8/25/2017|50  |687|201    |
|city 1|prod 3 |9/9/2017 |230 |430|160    |
|city 1|prod 4 |9/27/2017|350 |90 |190    |
|city 2|prod 2 …
Run Code Online (Sandbox Code Playgroud)

apache-spark apache-spark-sql

6
推荐指数
2
解决办法
6853
查看次数

如何在Apache Spark中获取上一行的数据

从Spark Data框架中查找每个城市的上个月销售额

|City|     Month   |Sale|
+----+----------- +----- +
|  c1|    JAN-2017|  49 |
|  c1|    FEB-2017|  46 |
|  c1|    MAR-2017|  83 |
|  c2|    JAN-2017|  59 |
|  c2|    MAY-2017|  60 |
|  c2|    JUN-2017|  49 |
|  c2|    JUL-2017|  73 |
+----+-----+----+-------
Run Code Online (Sandbox Code Playgroud)

所需的解决方案是

|City|     Month  |Sale   |previous_sale|
+----+-----+-------+-------------+--------
|  c1|    JAN-2017|  49|           NULL  |
|  c1|    FEB-2017|  46|           49    |
|  c1|    MAR-2017|  83|           46    |
|  c2|    JAN-2017|  59|           NULL  |
|  c2|    MAY-2017|  60|           59    | …
Run Code Online (Sandbox Code Playgroud)

scala dataframe apache-spark

2
推荐指数
1
解决办法
3698
查看次数

标签 统计

apache-spark ×2

apache-spark-sql ×1

dataframe ×1

scala ×1