如何为数据框中的行对设置 id?

K8O*_*ter 3 r dplyr

我有一个如下所示的数据框:

   File                          Time Behavior Status   diff    id
   <chr>                        <dbl> <chr>    <chr>   <dbl> <int>
 1 K8053121_serial-food-depr_04  389. Protrude START   6.25      1
 2 K8053121_serial-food-depr_04  409. Protrude STOP    3.25      1
 3 K8060221_serial-food-depr_01  669. Protrude START  19.0       1
 4 K8060221_serial-food-depr_01  757. Protrude STOP    0.247     1
 5 K8060221_serial-food-depr_01  864. Protrude START   8.00      1
 6 K8060221_serial-food-depr_01  929. Protrude STOP    0         1
 7 K8060221_serial-food-depr_02  477. Protrude START  25.0       1
 8 K8060221_serial-food-depr_02  502. Protrude STOP    2.00      1
 9 K8060221_serial-food-depr_02  562. Protrude START  22.7       1
10 K8060221_serial-food-depr_02  570. Protrude STOP    5.50      1
11 K8060221_serial-food-depr_02  924. Protrude START  18.3       1
12 K8060221_serial-food-depr_02  958. Protrude STOP    0         1
13 K8060221_serial-food-depr_04  215. Protrude START   5.93      1
14 K8060221_serial-food-depr_04  283. Protrude STOP    0         1
15 K8060221_serial-food-depr_04  291. Protrude START   0.25      1
Run Code Online (Sandbox Code Playgroud)

这是 dput 输出:

structure(list(File = c("K8053121_serial-food-depr_04", "K8053121_serial-food-depr_04", 
"K8060221_serial-food-depr_01", "K8060221_serial-food-depr_01", 
"K8060221_serial-food-depr_01", "K8060221_serial-food-depr_01", 
"K8060221_serial-food-depr_02", "K8060221_serial-food-depr_02", 
"K8060221_serial-food-depr_02", "K8060221_serial-food-depr_02", 
"K8060221_serial-food-depr_02", "K8060221_serial-food-depr_02", 
"K8060221_serial-food-depr_04", "K8060221_serial-food-depr_04", 
"K8060221_serial-food-depr_04"), Time = c(388.936, 408.683, 668.534, 
757.371, 863.721, 929.222, 477.278, 501.845, 561.649, 569.901, 
923.537, 957.571, 214.577, 283.075, 291.077), Behavior = c("Protrude", 
"Protrude", "Protrude", "Protrude", "Protrude", "Protrude", "Protrude", 
"Protrude", "Protrude", "Protrude", "Protrude", "Protrude", "Protrude", 
"Protrude", "Protrude"), Status = c("START", "STOP", "START", 
"STOP", "START", "STOP", "START", "STOP", "START", "STOP", "START", 
"STOP", "START", "STOP", "START"), diff = c(6.24899999999997, 
3.24700000000001, 19.0169999999999, 0.246999999999957, 7.99800000000005, 
0, 24.956, 1.99900000000002, 22.749, 5.50099999999998, 18.2660000000001, 
0, 5.92500000000001, 0, 0.25), id = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), row.names = c(NA, -15L), groups = structure(list(
    File = c("K8053121_serial-food-depr_04", "K8060221_serial-food-depr_01", 
    "K8060221_serial-food-depr_02", "K8060221_serial-food-depr_04"
    ), .rows = structure(list(1:2, 3:6, 7:12, 13:15), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))
Run Code Online (Sandbox Code Playgroud)

我正在尝试生成一个大致如下所示的数据框:

  File                         Behavior    id START  STOP duration
  <chr>                        <chr>    <int> <dbl> <dbl>    <dbl>
1 K8053121_serial-food-depr_04 Protrude     1  389.  409.     19.7
2 K8060221_serial-food-depr_01 Protrude     1  766.  843.     77.2
3 K8060221_serial-food-depr_02 Protrude     1  654.  676.     22.3
4 K8060221_serial-food-depr_04 Protrude     1  432.  464.     32.0
Run Code Online (Sandbox Code Playgroud)

除了此数据框将行为的多个实例折叠为一行而不是将它们分开。例如上面的数据框应该是:

  File                         Behavior    id START  STOP duration
  <chr>                        <chr>    <int> <dbl> <dbl>    <dbl>
1 K8053121_serial-food-depr_04 Protrude     1  389.  409.     20
2 K8060221_serial-food-depr_01 Protrude     1  669.  757.     88
3 K8060221_serial-food-depr_01 Protrude     2  864.  929.     65
4 K8060221_serial-food-depr_02 Protrude     1  477.  502.     25
5 K8060221_serial-food-depr_02 Protrude     2  562.  570.     8
6 K8060221_serial-food-depr_02 Protrude     3  924.  958.     34
Run Code Online (Sandbox Code Playgroud)

等等...

这是我尝试过的:

protrude_data <- subset(boris_df, Behavior == "Protrude") %>%
  mutate(id = rleid(Behavior)) %>%
  group_by(File,id) %>% 
  pivot_wider(id_cols = c("File","Behavior", "id"),
              names_from = "Status",
              values_from = "Time",
              values_fn = list(Time = mean)) %>%
  mutate(duration = STOP - START)
Run Code Online (Sandbox Code Playgroud)

此方法适用于前面的示例,因为我有不同的行为,因此它们的编号不同,我不确定如何修改 id 以执行我想要的操作。

我运行这些行的中间步骤:

protrude_data <- subset(boris_df, Behavior == "Protrude") %>%
  mutate(id = rleid(Behavior))

Run Code Online (Sandbox Code Playgroud)

是:

File                          Time Behavior Status   diff    id
  <chr>                        <dbl> <chr>    <chr>   <dbl> <int>
1 K8053121_serial-food-depr_04  389. Protrude START   6.25      1
2 K8053121_serial-food-depr_04  409. Protrude STOP    3.25      1
3 K8060221_serial-food-depr_01  669. Protrude START  19.0       1
4 K8060221_serial-food-depr_01  757. Protrude STOP    0.247     1
5 K8060221_serial-food-depr_01  864. Protrude START   8.00      1
6 K8060221_serial-food-depr_01  929. Protrude STOP    0         1
Run Code Online (Sandbox Code Playgroud)

我希望它看起来像:

File                          Time Behavior Status   diff    id
  <chr>                        <dbl> <chr>    <chr>   <dbl> <int>
1 K8053121_serial-food-depr_04  389. Protrude START   6.25      1
2 K8053121_serial-food-depr_04  409. Protrude STOP    3.25      1
3 K8060221_serial-food-depr_01  669. Protrude START  19.0       1
4 K8060221_serial-food-depr_01  757. Protrude STOP    0.247     1
5 K8060221_serial-food-depr_01  864. Protrude START   8.00      2
6 K8060221_serial-food-depr_01  929. Protrude STOP    0         2
Run Code Online (Sandbox Code Playgroud)

等等...

Ron*_*hah 5

您可以递增id为每个值'START'使用cumsum

library(dplyr)
library(tidyr)

df %>%
  filter(Behavior == "Protrude") %>%
  mutate(id = cumsum(Status == 'START')) %>%
  pivot_wider(id_cols = c(File,Behavior, id),
              names_from = Status,
              values_from = Time,
              values_fn = list(Time = mean)) %>%
  mutate(duration = STOP - START) %>%
  ungroup

#  File                         Behavior    id START  STOP duration
#  <chr>                        <chr>    <int> <dbl> <dbl>    <dbl>
#1 K8053121_serial-food-depr_04 Protrude     1  389.  409.    19.7 
#2 K8060221_serial-food-depr_01 Protrude     1  669.  757.    88.8 
#3 K8060221_serial-food-depr_01 Protrude     2  864.  929.    65.5 
#4 K8060221_serial-food-depr_02 Protrude     1  477.  502.    24.6 
#5 K8060221_serial-food-depr_02 Protrude     2  562.  570.     8.25
#6 K8060221_serial-food-depr_02 Protrude     3  924.  958.    34.0 
#7 K8060221_serial-food-depr_04 Protrude     1  215.  283.    68.5 
#8 K8060221_serial-food-depr_04 Protrude     2  291.   NA     NA   
Run Code Online (Sandbox Code Playgroud)