use*_*441 2 r outliers dataframe
我有一个数据框,假设这个:
names<-c("a","a","a","a","a","b","b","b","b","b","c","c","c","c","c","c","c","c")
var1<-c(0.942999593,0.935507266,0.973589623,0.969415912,0.95230801,0.935507266,0.888740961,0.91750551,0.944482672,0.945468585,1.457579147,0.922206277,0.941511433,0.954724791,0.941014244,0.941511433,0.941511433,1.50511433)
var2<-c(-0.012678088,0.014313763,0.001138275,-0.020568206,0.012987126,0.001217192,0.03360358,0.009758172,0.015066932,-0.037879492,0.020471157,0.010738162,0.010952531,0.019377213,0.027140572,0.031116892,-0.018530676,-8.90E-05)
as.data.frame(cbind(names,var1,var2))->df
Run Code Online (Sandbox Code Playgroud)
我想在列var1和var2中将异常值转换为Na.但是,我想为"名称"列中的每个类别独立计算离群值.因此var1中"a"的异常值将是仅使用var1中前5行发现的异常值.
我检测异常值的方式是分别低于或高于分位数0.25和0.75的所有值.
在R中有没有简单的方法呢?
非常感谢你提前.
蒂娜.
这是你如何为var1做的:
quantiles<-tapply(var1,names,quantile)
minq <- sapply(names, function(x) quantiles[[x]]["25%"])
maxq <- sapply(names, function(x) quantiles[[x]]["75%"])
var1[var1<minq | var1>maxq] <- NA
Run Code Online (Sandbox Code Playgroud)
对var2(或df $ var2)重复相同的操作.
| 归档时间: |
|
| 查看次数: |
3430 次 |
| 最近记录: |