我有一个非常大的数据集,看起来简化如下:
row. member_id entry_id comment_count timestamp
1 1 a 4 2008-06-09 12:41:00
2 1 b 1 2008-07-14 18:41:00
3 1 c 3 2008-07-17 15:40:00
4 2 d 12 2008-06-09 12:41:00
5 2 e 50 2008-09-18 10:22:00
6 3 f 0 2008-10-03 13:36:00
Run Code Online (Sandbox Code Playgroud)
我可以使用以下代码聚合计数:
transform(df, aggregated_count = ave(comment_count, member_id, FUN = cumsum))
Run Code Online (Sandbox Code Playgroud)
但我想在累积数据中滞后1,或者我想cumsum忽略当前行.结果应该是:
row. member_id entry_id comment_count timestamp previous_comments
1 1 a 4 2008-06-09 12:41:00 0
2 1 b 1 2008-07-14 18:41:00 4
3 1 c 3 2008-07-17 15:40:00 …Run Code Online (Sandbox Code Playgroud)