根据分位数对数据框进行分组

And*_*iță 1 r function subset dplyr

如果我有此数据框:

df <- data.frame(time = seq(as.Date('2000-01-01'), length.out = 200, by = 'days'),
             a = rnorm(200,8.4, 22), b=rnorm(200,8.4, 22), d= rnorm(200,8.4, 22), 
e=rnorm(200,8.4, 22))
Run Code Online (Sandbox Code Playgroud)

子集化最简单的方法是什么,df以便每列的值都应大于百分之十,而小于百分之九十?

我可以使用循环来做到这一点,即:

for (i in names(df[,2:5])){
  print(i)
  column <- df[,c('time', i)]
  q <- unname(quantile(column[,2], probs = c(0.1, 0.9))) # just for one column
  column <- column[column[,2] > q[1] &column[,2] < q[2],]
  df <- merge(df, column, by = 'time', all.x = T)
}
Run Code Online (Sandbox Code Playgroud)

但是有更简单更优雅的方式使用函数或包这样做dplyr。谢谢!

H 1*_*H 1 5

这是一种dplyr方法:

library(dplyr)

df %>% 
  mutate_at(vars(a:e), function(x) if_else(between(percent_rank(x), .1, .9), x, NA_real_))
Run Code Online (Sandbox Code Playgroud)