cumsum 与 r 中标记列的重置?

Nor*_*ude 4 for-loop if-statement r cumsum

这是我第一次提问,所以请耐心等待。

我的数据集(df)是这样的:

animal   azimuth   south   distance
 pb1      187.561   1       1.992 
 pb1      147.219   1       8.567
 pb1      71.032    0       5.754
 pb1      119.502   1       10.451
 pb2      101.702   1       9.227
 pb2      85.715    0       8.821
Run Code Online (Sandbox Code Playgroud)

我想创建一个额外的列 ( df$cumdist) 来增加累积距离,但在每个单独的动物中,并且仅当df$south==1. 我希望累积总和用df$south==0.

这就是我想要的结果(手动完成):

animal   azimuth   south   distance  cumdist
 pb1      187.561   1       1.992     1.992
 pb1      147.219   1       8.567     10.559 
 pb1      71.032    0       5.754     0 
 pb1      119.502   1       10.451    10.451
 pb2      101.702   1       9.227     9.227 
 pb2      85.715    0       8.821     0
Run Code Online (Sandbox Code Playgroud)

这是我试图实现 cumsum 的代码:

swim.az$cumdist <- cumsum(ifelse(swim.az$south==1, swim.az$distance, 0))
Run Code Online (Sandbox Code Playgroud)

当它成功停止添加时df$south==0,它不会重置。此外,我知道我需要将它嵌入到 for 循环中以按动物进行子集化。

非常感谢!

akr*_*run 5

我们将“南”乘以“距离”(“cumdist”)以将“南”中对应于 0 的“距离”中的值更改为 0,按“动物”分组,并通过取逻辑的累积总和创建的组向量 ( south == 0),获取cumsum'cumdist' 的值,ungroup并删除不需要的列 ( grp)

library(dplyr)
dfN %>% 
  mutate(cumdist = south * distance) %>%
  group_by(animal, grp = cumsum(south == 0)) %>%
  mutate(cumdist = cumsum(cumdist)) %>%
  ungroup %>%
  select(-grp)
# A tibble: 6 x 5
#  animal azimuth south distance cumdist
#  <chr>    <dbl> <int>    <dbl>   <dbl>
#1 pb1      188.      1     1.99    1.99
#2 pb1      147.      1     8.57   10.6 
#3 pb1       71.0     0     5.75    0   
#4 pb1      120.      1    10.5    10.5 
#5 pb2      102.      1     9.23    9.23
#6 pb2       85.7     0     8.82    0   
Run Code Online (Sandbox Code Playgroud)

或类似的方法 base R

with(dfN, ave(distance * south, animal, cumsum(!south), FUN = cumsum))
#[1]  1.992 10.559  0.000 10.451  9.227  0.000
Run Code Online (Sandbox Code Playgroud)

数据

dfN <- structure(list(animal = c("pb1", "pb1", "pb1", "pb1", "pb2", 
"pb2"), azimuth = c(187.561, 147.219, 71.032, 119.502, 101.702, 
85.715), south = c(1L, 1L, 0L, 1L, 1L, 0L), distance = c(1.992, 
8.567, 5.754, 10.451, 9.227, 8.821)), class = "data.frame", 
row.names = c(NA, -6L))
Run Code Online (Sandbox Code Playgroud)