我有这个数据框:
> head(merged.tables)
Store DayOfWeek Date Sales Customers Open Promo StateHoliday SchoolHoliday StoreType
1 1 5 2015-07-31 5263 555 1 1 0 1 c
2 1 6 2013-01-12 4952 646 1 0 0 0 c
3 1 5 2014-01-03 4190 552 1 0 0 1 c
4 1 3 2014-12-03 6454 695 1 1 0 0 c
5 1 3 2013-11-13 3310 464 1 0 0 0 c
6 1 7 2013-10-27 0 0 0 0 0 0 c
Assortment CompetitionDistance CompetitionOpenSinceMonth CompetitionOpenSinceYear Promo2
1 a 1270 9 2008 0
2 a 1270 9 2008 0
3 a 1270 9 2008 0
4 a 1270 9 2008 0
5 a 1270 9 2008 0
6 a 1270 9 2008 0
Promo2SinceWeek Promo2SinceYear PromoInterval
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
Run Code Online (Sandbox Code Playgroud)
然后我想提取一个数据框,显示当Open 等于 1和StoreType时Sales向量的平均值。我使用这个命令是因为它是我认为最致命的:
merged.tables[StateHoliday==1,mean(na.omit(Sales)),by=StoreType]
Run Code Online (Sandbox Code Playgroud)
但我收到了这个错误:
[.data.frame(merged.tables, StateHoliday == 0, mean(na.omit(Sales)), 中的错误:未使用的参数(by = StoreType)
我搜索但我没有得到这个错误的答案。谢谢你的帮助!
有很多方法可以将函数应用于数据框中的一组值。我介绍两个:
\n\n\n\nOpen对于每种商店类型,我想要那些值等于 1 的商店的平均销售额。
注意:以下数据框仅取 OP 中发布的几列。
\n\n# install necessary package\ninstall.packages( pkgs = "dplyr" )\n\n# load necessary package\nlibrary( dplyr )\n\n# create data frame\nmerged.tables <-\n data.frame(\n Store = c( 1, 1, 1, 2, 2, 2 )\n , StoreType = rep( x = c( "s", "m", "l" ) , times = 2)\n , Sales = round( x = runif( n = 6, min = 3000, max = 6000 ) , digits = 0 )\n , Open = c( 1, 1, 0, 0, 1, 1 )\n , stringsAsFactors = FALSE\n )\n\n# view the data\nmerged.tables\n# Store StoreType Sales Open\n# 1 1 s 4608 1\n# 2 1 m 4017 1\n# 3 1 l 4210 0\n# 4 2 s 4833 0\n# 5 2 m 3818 1\n# 6 2 l 3090 1\n\n# dplyr method\nmerged.tables %>%\n group_by( StoreType ) %>%\n filter( Open == 1 ) %>%\n summarise( AverageSales = mean( x = Sales , na.rm = TRUE ) )\n# A tibble: 3 x 2\n# StoreType AverageSales\n# <chr> <dbl>\n# 1 l 3090\n# 2 m 3918\n# 3 s 4608\n\n\n# tapply method\n\n# create the condition\n# that \'Open\' must be equal to one\nOpen.equals.one <- which( merged.tables$Open == 1 )\n\n# apply the condition to\n# both X and INDEX\ntapply( X = merged.tables$Sales[ Open.equals.one ]\n , INDEX = merged.tables$StoreType[ Open.equals.one ]\n , FUN = mean\n , na.rm = TRUE # just in case your data does have NA values in the `Sales` column, this removes them from the calculation\n)\n# l m s \n# 3090.0 3917.5 4608.0 \n\n# end of script #\nRun Code Online (Sandbox Code Playgroud)\n\n如果您稍后需要更多条件,我鼓励您查看其他相关的 SO 帖子,例如如何使用 \xe2\x80\x9cOR\xe2\x80\x9d 组合多个条件来对数据帧进行子集化?为什么比[更好subset?。