我有一个看似简单的问题,但我无法弄清楚如何得到我想要的东西.
我的数据如下:
Job C/C++ Java Python
Student FALSE TRUE FALSE
Developer TRUE TRUE TRUE
Developer TRUE TRUE FALSE
Sysadmin TRUE FALSE FALSE
Student FALSE TRUE TRUE
Run Code Online (Sandbox Code Playgroud)
我想按"作业"列进行分组,并计算TRUE每列中的s 数.我想要的输出看起来像这样:
Job C/C++ Java Python
Student 0 2 1
Developer 2 2 1
Sysadmin 1 0 0
Run Code Online (Sandbox Code Playgroud)
任何帮助将不胜感激.
假设您的data.frame被称为"temp",只需使用aggregate:
aggregate(. ~ Job, temp, sum)
# Job C.C.. Java Python
# 1 Developer 2 2 1
# 2 Student 0 2 1
# 3 Sysadmin 1 0 0
Run Code Online (Sandbox Code Playgroud)
逻辑是,TRUE并且FALSE等同于"1"和"0"的数值,因此您可以简单地sum在聚合时使用.
并且,为完整性添加"tidyverse"解决方案:
library(tidyverse)
temp %>%
group_by(Job) %>%
summarise_all(sum)
# # A tibble: 3 x 4
# Job C.C.. Java Python
# <chr> <int> <int> <int>
# 1 Developer 2 2 1
# 2 Student 0 2 1
# 3 Sysadmin 1 0 0
Run Code Online (Sandbox Code Playgroud)
这是您的数据,格式易于复制和粘贴.这是通过使用获得的,dput(your-actual-data-frame-name)并且是将R问题发布到Stack Overflow时将来应该使用的.
temp <- structure(list(Job = c("Student", "Developer", "Developer", "Sysadmin",
"Student"), C.C.. = c(FALSE, TRUE, TRUE, TRUE, FALSE), Java = c(TRUE,
TRUE, TRUE, FALSE, TRUE), Python = c(FALSE, TRUE, FALSE, FALSE, TRUE)),
.Names = c("Job", "C.C..", "Java", "Python"), class = "data.frame",
row.names = c(NA, -5L))
Run Code Online (Sandbox Code Playgroud)
替代plyr和data.table解决方案:
data.table:
require(data.table)
tmp.dt <- data.table(temp, key="Job")
tmp.dt[, lapply(.SD, sum), by=Job]
# Job C.C.. Java Python
# 1: Developer 2 2 1
# 2: Student 0 2 1
# 3: Sysadmin 1 0 0
Run Code Online (Sandbox Code Playgroud)
plyr:
require(plyr)
ddply(temp, .(Job), function(x) colSums(x[-1]))
# Job C.C.. Java Python
# 1 Developer 2 2 1
# 2 Student 0 2 1
# 3 Sysadmin 1 0 0
Run Code Online (Sandbox Code Playgroud)
编辑:如果不是TRUE/FALSE,你要计算Newbie's 的数量,然后:
使用data.table:
require(data.table)
tmp.dt <- data.table(temp, key="Job")
tmp.dt[, lapply(.SD, function(x) sum(x == "Newbie")), by=Job]
Run Code Online (Sandbox Code Playgroud)
与plyr:
require(plyr)
ddply(temp, .(Job), function(x) colSums(x[-1] == "Newbie"))
Run Code Online (Sandbox Code Playgroud)