如何检索数据框中存在的列中重复次数最多的值

Question

如何检索数据框中存在的列中重复次数最多的值

我试图检索数据框中存在的特定列中最重复的值.这是我的示例数据和代码如下.

data("Forbes2000", package = "HSAUR")
head(Forbes2000)


  rank                name        country             category  sales profits  assets marketvalue
1    1           Citigroup  United States              Banking  94.71   17.85 1264.03      255.30
2    2    General Electric  United States        Conglomerates 134.19   15.59  626.93      328.54
3    3 American Intl Group  United States            Insurance  76.66    6.46  647.66      194.87
4    4          ExxonMobil  United States Oil & gas operations 222.88   20.96  166.99      277.02
5    5                  BP United Kingdom Oil & gas operations 232.57   10.27  177.57      173.54
6    6     Bank of America  United States              Banking  49.01   10.81  736.45      117.55

Run Code Online (Sandbox Code Playgroud)

根据我的样本数据,我需要返回最重复的类别,即保险.

subset(subset(Forbes2000,country=="Bermuda")

Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 15

tail(names(sort(table(Forbes2000$category))), 1)

Run Code Online (Sandbox Code Playgroud)

Answer 2

Jos*_*ien 9

如果两个或更多类别可能最常用,请使用以下内容:

x <- c("Insurance", "Insurance", "Capital Goods", "Food markets", "Food markets")
tt <- table(x)
names(tt[tt==max(tt)])
[1] "Food markets" "Insurance"

Run Code Online (Sandbox Code Playgroud)

Answer 3

tuc*_*son 5

data.table 包的另一种方式，对于大型数据集来说更快：

set.seed(1)
x=sample(seq(1,100), 5000000, replace = TRUE)

Run Code Online (Sandbox Code Playgroud)

方法1（上面提出的解决方案）

start.time <- Sys.time()
tt <- table(x)
names(tt[tt==max(tt)])
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

Run Code Online (Sandbox Code Playgroud)

时差 4.883488 秒

方法二（数据表）

start.time <- Sys.time()
ds <- data.table( x )
setkey(ds, x)
sorted <- ds[,.N,by=list(x)]

most_repeated_value <- sorted[order(-N)]$x[1]
most_repeated_value

end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

Run Code Online (Sandbox Code Playgroud)

0.328033秒的时间差

图森，不错。我认为 `as.data.table(ds)[, .N, by=x][, x[N == max(N)]]` 也可以完成这项工作，在我的笔记本电脑上需要 0.06 秒。仅供参考，无需为聚合设置`setkey`。 (5认同)

归档时间：	13 年，3 月前
查看次数：	21515 次
最近记录：	6 年，2 月前