我需要从R中的以下数据框中为每个组[yearmonth]值选择前两个值.我已经按照count和yearmonth对数据进行了排序.如何在以下数据中实现这一点?
yearmonth name count
1 201310 Dovas 5
2 201310 Indulgd 2
3 201310 Justina 1
4 201310 Jolita 1
5 201311 Shahrukh Sheikh 1
6 201311 Dovas 29
7 201311 Justina 13
8 201311 Lina 8
9 201312 sUPERED 7
10 201312 John Hansen 7
11 201312 Lina D. 6
12 201312 joanna1st 5
Run Code Online (Sandbox Code Playgroud)
或使用data.table(mydf来自@ jazzurro的帖子).有些选择
library(data.table)
setDT(mydf)[order(yearmonth,-count), .SD[1:2], by=yearmonth]
Run Code Online (Sandbox Code Playgroud)
要么
setDT(mydf)[mydf[order(yearmonth, -count), .I[1:2], by=yearmonth]$V1,]
Run Code Online (Sandbox Code Playgroud)
要么
setorder(setkey(setDT(mydf), yearmonth), yearmonth, -count)[
,.SD[1:2], by=yearmonth]
# yearmonth name count
#1: 201310 Dovas 5
#2: 201310 Indulgd 2
#3: 201311 Dovas 29
#4: 201311 Justina 13
#5: 201312 sUPERED 7
#6: 201312 John Hansen 7
Run Code Online (Sandbox Code Playgroud)
这是一种方法:
library(dplyr)
mydf %>%
group_by(yearmonth) %>%
arrange(desc(count)) %>%
slice(1:2)
# yearmonth name count
#1 201310 Dovas 5
#2 201310 Indulgd 2
#3 201311 Dovas 29
#4 201311 Justina 13
#5 201312 sUPERED 7
#6 201312 John Hansen 7
Run Code Online (Sandbox Code Playgroud)
数据
mydf <- data.frame(yearmonth = rep(c("201310", "201311", "201312"), each = 4),
name = c("Dovas", "Indulgd", "Justina", "Jolita", "Shahrukh Sheikh",
"Dovas", "Justina", "Lina", "sUPERED", "John Hansen",
"Lina D.", "joanna1st"),
count = c(5,2,1,1,1,29,13,8,7,7,6,5),
stringsAsFactors = FALSE)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8788 次 |
| 最近记录: |