从R中的数据框中获取具有多个单独观察的组级观察计数

gol*_*ine 11 r dataframe

我如何得到这样的数据帧:

soccer_player country position
"sam"         USA     left defender
"jon"         USA     right defender
"sam"         USA     left midfielder
"jon"         USA     offender
"bob"         England goalie
"julie"       England central midfielder
"jane"        England goalie
Run Code Online (Sandbox Code Playgroud)

看起来像这样(每个国家/地区拥有独特玩家数量的国家/地区):

country player_count
USA     2
England 3
Run Code Online (Sandbox Code Playgroud)

显而易见的复杂因素是每个玩家有多个观察,所以我不能简单table(df$country)地获得每个国家的观察数量.

我一直在玩table()merge()功能,但没有运气.

Ben*_*ker 7

dplyr v 3.0新功能提供了一个紧凑的解决方案:

数据:

dd <- read.csv(text='
soccer_player,country,position
"sam",USA,left defender
"jon",USA,right defender
"sam",USA,left midfielder
"jon",USA,offender
"bob",England,goalie
"julie",England,central midfielder
"jane",England,goalie')
Run Code Online (Sandbox Code Playgroud)

码:

library(dplyr)

dd %>% distinct(soccer_player,country) %>% 
       count(country)
Run Code Online (Sandbox Code Playgroud)


Mat*_*rde 6

这是一种方式:

as.data.frame(table(unique(d[-3])$country))
#      Var1 Freq
# 1 England    3
# 2     USA    2
Run Code Online (Sandbox Code Playgroud)

删除第三列,删除任何重复的国家/地区名称对,然后计算每个国家/地区的出现次数.


Señ*_*r O 6

不使用任何包,你可以做:

List = by(df, df$country, function(x) length(unique(x$soccer_player)))
DataFrame = do.call(rbind, lapply(names(List), function(x) 
  data.frame(country=x, player_count=List[[x]])))
#  country player_count
#1 England            2
#2     USA            2
Run Code Online (Sandbox Code Playgroud)

通过以下方式更容易data.table:

dt = data.table(df)
dt[,list(player_count = length(unique(soccer_player))),by=country]
Run Code Online (Sandbox Code Playgroud)