快速和缺点是我在使用相同因素的条件汇总计数和聚合函数时遇到问题.
假设我有这个数据帧:
library(dplyr)
df = tbl_df(data.frame(
company=c("Acme", "Meca", "Emca", "Acme", "Meca", "Emca"),
year=c("2011", "2010", "2009", "2011", "2010", "2013"),
product=c("Wrench", "Hammer", "Sonic Screwdriver", "Fairy Dust",
"Kindness", "Helping Hand"),
price=c("5.67", "7.12", "12.99", "10.99", NA, FALSE)))
Run Code Online (Sandbox Code Playgroud)
这创建了这个数据帧(本质上):
company year product price
1 Acme 2011 Wrench 5.67
2 Meca 2010 Hammer 7.12
3 Emca 2009 Sonic Screwdriver 12.99
4 Acme 2011 Fairy Dust 10.99
5 Meca 2010 Kindness NA
... ... ... ... ...
n Emca 2013 Helping Hand FALSE
Run Code Online (Sandbox Code Playgroud)
假设我想df <- group_by(df, company, year, product)在一个集合(即数据帧)中获取以下信息:
最高价格
summarize(df, count = n()) #satisfies first item obviously
Run Code Online (Sandbox Code Playgroud)我在尝试获取其他人时遇到了问题.我想我需要使用管道运营商?如果是这样,有人可以提供一些指导吗?
这是我尝试过的,但它是明显错误的,但我不知道下一步该怎么做:
summarize(df,
total.count = n(),
count = filter(df, is.na(price)),
avg.price = filter(df, !is.na(price), price != FALSE),
max.price = max(filter(df, !is.na(price), price != FALSE))
Run Code Online (Sandbox Code Playgroud)
是的,我已经审阅了文档,我确信答案已经存在,但它们可能对我的理解来说太高级了.提前致谢!
akr*_*run 53
假设您的原始数据集与您创建的数据集类似(即使用NAas character.您可以在使用na.strings时读取数据时指定read.table.但是,我想会自动检测到NA.
该price列factor需要转换为numeric类.使用时as.numeric,所有非数字元素(即"NA"FALSE)都会被强制转换NA为警告.
library(dplyr)
df %>%
mutate(price=as.numeric(as.character(price))) %>%
group_by(company, year, product) %>%
summarise(total.count=n(),
count=sum(is.na(price)),
avg.price=mean(price,na.rm=TRUE),
max.price=max(price, na.rm=TRUE))
Run Code Online (Sandbox Code Playgroud)
我使用的是相同的dataset(除了...行).
df = tbl_df(data.frame(company=c("Acme", "Meca", "Emca", "Acme", "Meca","Emca"),
year=c("2011", "2010", "2009", "2011", "2010", "2013"), product=c("Wrench", "Hammer",
"Sonic Screwdriver", "Fairy Dust", "Kindness", "Helping Hand"), price=c("5.67",
"7.12", "12.99", "10.99", "NA",FALSE)))
Run Code Online (Sandbox Code Playgroud)