在我的数据框中,我有多列包含学生成绩。我想对“测验”列进行求和(例如测验1、测验2)。但是,我只想对前 2 个值求和,而忽略其他值。我想创建一个包含总计(即前 2 个值的总和)的新列。
\n一个问题是,有些学生的成绩与给定行中的前 2 名成绩并列。例如,Aaron 的得分很高,为 42,但随后有两个得分并列第二高(即 36)。
\n数据
\ndf <- \n structure(\n list(\n Student = c("Aaron", "James", "Charlotte", "Katie", "Olivia", \n "Timothy", "Grant", "Chloe", "Judy", "Justin"),\n ID = c(30016, 87311, 61755, 55323, 94839, 38209, 34096, \n 98432, 19487, 94029),\n Quiz1 = c(31, 25, 41, 10, 35, 19, 27, 42, 15, 20),\n Quiz2 = c(42, 33, 34, 22, 23, 38, 48, 49, 23, 30),\n Quiz3 = c(36, 36, 34, 32, 43, 38, 44, 42, 42, 37),\n Quiz4 = c(36, 43, 39, 46, 40, 38, 43, 35, 41, 41)\n ),\n row.names = c(NA, -10L),\n class = c("tbl_df", "tbl", "data.frame")\n)\nRun Code Online (Sandbox Code Playgroud)\n我知道我可以pivot_longer这样做,这样我就可以按组排列,然后为每个学生取前 2 个值。这工作得很好,但我想要一种更有效的方式tidyverse,而不是来回转动。
我尝试过的
\nlibrary(tidyverse)\n\ndf %>%\n pivot_longer(-c(Student, ID)) %>%\n group_by(Student, ID) %>%\n arrange(desc(value), .by_group = TRUE) %>%\n slice_head(n = 2) %>%\n pivot_wider(names_from = name, values_from = value) %>%\n ungroup() %>%\n mutate(Total = rowSums(select(., starts_with("Quiz")), na.rm = TRUE))\nRun Code Online (Sandbox Code Playgroud)\n我还知道,如果我想对每行上的所有列求和,那么我可以使用rowSums,就像我在上面使用的那样。但是,我不确定如何rowSums仅处理 4 个测验列中的前 2 个值。
预期输出
\n# A tibble: 10 \xc3\x97 7\n Student ID Quiz2 Quiz3 Quiz1 Quiz4 Total\n <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 Aaron 30016 42 36 NA NA 78\n 2 Charlotte 61755 NA NA 41 39 80\n 3 Chloe 98432 49 NA 42 NA 91\n 4 Grant 34096 48 44 NA NA 92\n 5 James 87311 NA 36 NA 43 79\n 6 Judy 19487 NA 42 NA 41 83\n 7 Justin 94029 NA 37 NA 41 78\n 8 Katie 55323 NA 32 NA 46 78\n 9 Olivia 94839 NA 43 NA 40 83\n10 Timothy 38209 38 38 NA NA 76\nRun Code Online (Sandbox Code Playgroud)\n
library(tidyverse)\n\ndf <- \n structure(\n list(\n Student = c("Aaron", "James", "Charlotte", "Katie", "Olivia", \n "Timothy", "Grant", "Chloe", "Judy", "Justin"),\n ID = c(30016, 87311, 61755, 55323, 94839, 38209, 34096, \n 98432, 19487, 94029),\n Quiz1 = c(31, 25, 41, 10, 35, 19, 27, 42, 15, 20),\n Quiz2 = c(42, 33, 34, 22, 23, 38, 48, 49, 23, 30),\n Quiz3 = c(36, 36, 34, 32, 43, 38, 44, 42, 42, 37),\n Quiz4 = c(36, 43, 39, 46, 40, 38, 43, 35, 41, 41)\n ),\n row.names = c(NA, -10L),\n class = c("tbl_df", "tbl", "data.frame")\n )\n\ndf %>%\n rowwise() %>% \n mutate(Quiz_Total = sum(sort(c(Quiz1,Quiz2,Quiz3,Quiz4), decreasing = TRUE)[1:2])) %>% \n ungroup()\n#> # A tibble: 10 \xc3\x97 7\n#> Student ID Quiz1 Quiz2 Quiz3 Quiz4 Quiz_Total\n#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n#> 1 Aaron 30016 31 42 36 36 78\n#> 2 James 87311 25 33 36 43 79\n#> 3 Charlotte 61755 41 34 34 39 80\n#> 4 Katie 55323 10 22 32 46 78\n#> 5 Olivia 94839 35 23 43 40 83\n#> 6 Timothy 38209 19 38 38 38 76\n#> 7 Grant 34096 27 48 44 43 92\n#> 8 Chloe 98432 42 49 42 35 91\n#> 9 Judy 19487 15 23 42 41 83\n#> 10 Justin 94029 20 30 37 41 78\nRun Code Online (Sandbox Code Playgroud)\n
你不必这样做pivot_wider。请注意,较长的格式是整齐的格式。只要做pivot_longer并且left_join:
df %>%
left_join(pivot_longer(., -c(Student, ID)) %>%
group_by(Student, ID) %>%
summarise(Total = sum(sort(value, TRUE)[1:2]), .groups = 'drop'))
# A tibble: 10 x 7
Student ID Quiz1 Quiz2 Quiz3 Quiz4 Total
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Aaron 30016 31 42 36 36 78
2 James 87311 25 33 36 43 79
3 Charlotte 61755 41 34 34 39 80
4 Katie 55323 10 22 32 46 78
5 Olivia 94839 35 23 43 40 83
6 Timothy 38209 19 38 38 38 76
7 Grant 34096 27 48 44 43 92
8 Chloe 98432 42 49 42 35 91
9 Judy 19487 15 23 42 41 83
10 Justin 94029 20 30 37 41 78
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1083 次 |
| 最近记录: |