rowMeans在dplyr中起作用

Ves*_*cio 11 r dplyr

我一直在尝试运行的计算rowMeans范围内dplyrmutate功能,但是不断收到错误.下面是一个示例数据集和所需的结果.

DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"), 
                  DATE = c("1","1","2","2","3","3","3","4","4"), 
                  STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
                  STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000))

RESULT = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"), 
                    DATE = c("1","1","2","2","3","3","3","4","4"), 
                    STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
                    STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000),
                    NAYSA = c(1.5, 3, 45, 60, 150, 300, 450, 7500, 9000))
Run Code Online (Sandbox Code Playgroud)

我写的代码开始通过随机抽样STUFFSTUFF2.然后我想计算rowMeansof STUFFSTUFF2并将结果导出到新列.我可以使用完成此任务tidyr,但必须重做更多的变量.此外,我可以使用R base包,但更喜欢使用mutate函数in 找到解决方案dplyr.提前致谢.

RESULT = group_by(DATA, SITE, DATE) %>%
  mutate(STUFF=sample(STUFF,replace= TRUE), STUFF2 = sample(STUFF2,replace= TRUE))%>%
  # These approaches return errors 
  mutate(NAYSA = rowMeans(DATA[,-1:-2]))
  mutate(NAYSA = rowMeans(.[,-1:-2])) 
  mutate (NAYSE = rowMeans(.))
Run Code Online (Sandbox Code Playgroud)

Ves*_*cio 11

@GregF ungroup()是的.... 是关键.谢谢.

工作代码

RESULT = group_by(DATA, SITE, DATE) %>% 
  mutate(STUFF = sample(STUFF,replace= TRUE), 
         STUFF2 = sample(STUFF2,replace= TRUE)) %>% 
  ungroup() %>% 
  mutate(NAYSA = rowMeans(.[,-1:-2]))
Run Code Online (Sandbox Code Playgroud)

  • 使用 select 会更清晰:`mutate(NAYSA = rowMeans(select(., STUFF, STUFF2)))` (2认同)

Lyz*_*deR 9

你需要这个rowwise功能dplyr才能做到这一点.您的数据是随机的(因为样本)所以它会产生不同的结果,但您会看到它的工作原理:

library(dplyr)
  group_by(DATA, SITE, DATE) %>%
  mutate(STUFF=sample(STUFF,replace= TRUE), STUFF2 = sample(STUFF2,replace= TRUE))%>%
  rowwise() %>%
  mutate(NAYSA = mean(c(STUFF,STUFF2)))
Run Code Online (Sandbox Code Playgroud)

输出:

Source: local data frame [9 x 5]
Groups: <by row>

  SITE DATE STUFF STUFF2  NAYSA
1    A    1     1      2    1.5
2    A    1     2      2    2.0
3    A    2    30     80   55.0
4    A    2    30     60   45.0
5    B    3   200    600  400.0
6    B    3   300    200  250.0
7    B    3   100    600  350.0
8    C    4  5000  12000 8500.0
9    C    4  6000  10000 8000.0
Run Code Online (Sandbox Code Playgroud)

如您所见,它根据STUFF和STUFF2计算每行的行平均值

  • 正确,但是取消分组和rowMeans可能要快得多(不过未经测试。) (2认同)

Jak*_*her 6

现在 dplyr 已经推出,这可以通过基础 Racross来完成。以下代码将对以字符串“STUFF”开头的列进行逐行平均:acrossrowMeans

DATA %>% 
  mutate(NAYSA = rowMeans(across(starts_with("STUFF"))))
Run Code Online (Sandbox Code Playgroud)

  • 使用 dplyr 1.1.0+ `across()` 和空的 `.fns` 参数将引发警告并最终出现错误 https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-选择-重构-排列/。请改用“pick()”。 (2认同)