group by 后删除异常值，然后计算每组的平均值

Question

group by 后删除异常值，然后计算每组的平均值

我有一个数据框，我想首先对特定列（ID）进行分组，然后根据组从特定列（数字）中删除异常值，然后计算每个组的平均值。

library(dplyr)
id<-c("A","B","C","A","B","B")
id<-as.data.frame(id)
number <-c(5,10,2,6,1000,12)
number<-as.data.frame(number)
total<-cbind(id,number)

Run Code Online (Sandbox Code Playgroud)

我尝试了以下方法，但它不起作用

remove_outliers <- function(x, na.rm = TRUE, ...) {
  qnt <- quantile(x, probs = c(.25, .75), na.rm = na.rm, ...)
  val <- 1.5 * IQR(x, na.rm = na.rm)
  y <- x
  y[x < (qnt[1] - val)] <- NA
  y[x > (qnt[2] + val)] <- NA
  y
}


df2 <- total %>% 
  group_by(id) %>% 
  mutate(mean_val = remove_outliers(number)) %>% 
  ungroup() %>% 
  filter(!is.na(mean_val))

Run Code Online (Sandbox Code Playgroud)

如果有人可以提供帮助，我将不胜感激

输入和预期 O/P

Answer 1

Ron*_*hah 5

您的组中没有足够的观测值，无法B将 1000 个视为异常值。

看，

remove_outliers(c(5, 1000, 12))
#[1]    5 1000   12

Run Code Online (Sandbox Code Playgroud)

但是，如果您再添加一个观察值，它会将 1000 视为异常值。

remove_outliers(c(5, 1000, 12, 6))
#[1]  5 NA 12  6

Run Code Online (Sandbox Code Playgroud)

所以一般来说这样的东西应该给你预期的输出：

library(dplyr)

total %>% 
  group_by(id) %>% 
  mutate(mean_val = remove_outliers(number)) %>% 
  filter(!is.na(mean_val)) %>%
  mutate(mean_val = mean(mean_val)) %>%
  ungroup()

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，4 月前
查看次数：	665 次
最近记录：	5 年，4 月前