我有一个数据框 (my_data),即使可能存在联系,也只想计算 3 个最高值的总和。我对 R 很陌生,我已经使用了dplyr.
A tibble: 15 x 3
city month number
<chr> <chr> <dbl>
1 Lund jan 12
2 Lund feb 12
3 Lund mar 18
4 Lund apr 28
5 Lund may 28
6 Stockholm jan 15
7 Stockholm feb 15
8 Stockholm mar 30
9 Stockholm apr 30
10 Stockholm may 10
11 Uppsala jan 22
12 Uppsala feb 30
13 Uppsala mar 40
14 Uppsala apr 60
15 Uppsala may 30
Run Code Online (Sandbox Code Playgroud)
这是我试过的代码:
# For each city, count the top 3 of variable number
my_data %>% group_by(city) %>% top_n(3, number) %>% summarise(top_nr = sum(number))
Run Code Online (Sandbox Code Playgroud)
预期的(想要的)输出是:
# A tibble: 3 x 2
city top_nr
<chr> <dbl>
1 Lund 86
2 Stockholm 75
3 Uppsala 130
Run Code Online (Sandbox Code Playgroud)
但实际的 R 输出是:
# A tibble: 3 x 2
city top_nr
<chr> <dbl>
1 Lund 86
2 Stockholm 90
3 Uppsala 160
Run Code Online (Sandbox Code Playgroud)
似乎如果有平局,所有平局值都包含在总和中。我只想计算具有最高值的 3 个唯一实例。
任何帮助将非常感激!:)
我们可以做 adistinct来删除重复的元素。在这种方式top_n的工作原理是,如果值是重复的,它会继续,许多受骗者行
my_data %>%
distinct(city, number, .keep_all = TRUE) %>%
group_by(city) %>%
top_n(3, number) %>%
summarise(top_nr = sum(number))
Run Code Online (Sandbox Code Playgroud)
基于OP的新输出,在top_n输出(不是arranged)之后,得到按降序排列的'number',并得到sum前3个'number'的
my_data %>%
group_by(city) %>%
top_n(3, number) %>%
arrange(city, desc(number)) %>%
summarise(number = sum(head(number, 3)))
# A tibble: 3 x 2
# city number
# <chr> <int>
#1 Lund 74
#2 Stockholm 75
#3 Uppsala 130
Run Code Online (Sandbox Code Playgroud)
my_data <- structure(list(city = c("Lund", "Lund", "Lund", "Lund", "Lund",
"Stockholm", "Stockholm", "Stockholm", "Stockholm", "Stockholm",
"Uppsala", "Uppsala", "Uppsala", "Uppsala", "Uppsala"), month = c("jan",
"feb", "mar", "apr", "may", "jan", "feb", "mar", "apr", "may",
"jan", "feb", "mar", "apr", "may"), number = c(12L, 12L, 18L,
28L, 28L, 15L, 15L, 30L, 30L, 10L, 22L, 30L, 40L, 60L, 30L)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15"))
Run Code Online (Sandbox Code Playgroud)
如果没有top_n():生活可能会更简单:
dat %>%
group_by(city) %>%
summarize(
top_nr = sum(tail(sort(number), 3))
)
Run Code Online (Sandbox Code Playgroud)