我有数据框中的数据如下:
datetime | userId | memberId | value |
2016-04-06 16:36:... | 1234 | 111 | 1
2016-04-06 17:35:... | 1234 | 222 | 5
2016-04-06 17:50:... | 1234 | 111 | 8
2016-04-06 18:36:... | 1234 | 222 | 9
2016-04-05 16:36:... | 4567 | 111 | 1
2016-04-06 17:35:... | 4567 | 222 | 5
2016-04-06 18:50:... | 4567 | 111 | 8
2016-04-06 19:36:... | 4567 | 222 | 9
Run Code Online (Sandbox Code Playgroud)
我需要在userid,memberid中找到max(datetime)groupby.当我尝试如下:
df2 = df.groupBy('userId','memberId').max('datetime')
Run Code Online (Sandbox Code Playgroud)
我收到的错误是:
org.apache.spark.sql.AnalysisException: "datetime" …Run Code Online (Sandbox Code Playgroud)