记住缺失值,记住循环趋势

Nav*_*swi 1 r data-manipulation

想象一下日出的图片,其中一个红色圆圈被黄色厚环和蓝色背景包围.取红色为3然后黄色为2,蓝色为1.

 11111111111
 11111211111
 11112221111
 11222322211
 22223332222
 11222322221
 11112221111
 11111211111
Run Code Online (Sandbox Code Playgroud)

这是所需的输出.但是,记录/文件/数据缺少值(缺少所有元素的30%).

我们如何计算缺失值,以便获得所需的输出,同时牢记循环趋势.

For*_*ens 13

这就是我如何以一种非常简单,直接的方式解决这类问题的方法.请注意,我将上面的示例数据更正为对称:

d <- read.csv(header=F, stringsAsFactors=F, text="
1,1,1,1,1,1,1,1,1,1,1
1,1,1,1,1,2,1,1,1,1,1
1,1,1,1,2,2,2,1,1,1,1
1,1,2,2,2,3,2,2,2,1,1
2,2,2,2,3,3,3,2,2,2,2
1,1,2,2,2,3,2,2,2,1,1
1,1,1,1,2,2,2,1,1,1,1
1,1,1,1,1,2,1,1,1,1,1
")

library(raster)

##  Plot original data as raster:
d <- raster(as.matrix(d))
plot(d, col=colorRampPalette(c("blue","yellow","red"))(255))

##  Simulate 30% missing data:
d_m <- d
d_m[ sample(1:length(d), length(d)/3) ] <- NA
plot(d_m, col=colorRampPalette(c("blue","yellow","red"))(255))

##  Construct a 3x3 filter for mean filling of missing values:
filter <- matrix(1, nrow=3, ncol=3) 

##  Fill in only missing values with the mean of the values within
##    the 3x3 moving window specified by the filter.  Note that this
##    could be replaced with a median/mode or some other whole-number
##    generating summary statistic:
r <- focal(d_m, filter, mean, na.rm=T, NAonly=T, pad=T)

##  Plot imputed data:
plot(r, col=colorRampPalette(c("blue","yellow","red"))(255), zlim=c(1,3))
Run Code Online (Sandbox Code Playgroud)

这是原始样本数据的图像:

原始样本数据

模拟了30%的缺失值:

缺少价值观

只有那些用3x3移动窗口的平均值插值的缺失值:

在此输入图像描述

  • 这太好了! (4认同)

Rob*_*ans 5

在这里,我将Forrest的方法与薄板样条(TPS)进行比较.它们的性能大致相同 - 取决于样品.如果间隙较大,焦点无法再估计,那么TPS可能更好 - 但在这种情况下,你也可以使用更大的(也许是高斯,见?focalWeight)滤波器.

d <- matrix(c(
1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,2,1,1,1,1,1,
1,1,1,1,2,2,2,1,1,1,1,
1,1,2,2,2,3,2,2,2,1,1,
2,2,2,2,3,3,3,2,2,2,2,
1,1,2,2,2,3,2,2,2,1,1,
1,1,1,1,2,2,2,1,1,1,1,
1,1,1,1,1,2,1,1,1,1,1), ncol=11, byrow=TRUE)


library(raster)
d <- raster(d)
plot(d, col=colorRampPalette(c("blue","yellow","red"))(255))
##  Simulate 30% missing data:
set.seed(1)
d_m <- d
d_m[ sample(1:length(d), length(d)/3) ] <- NA
plot(d_m, col=colorRampPalette(c("blue","yellow","red"))(255))


# Forrest's solution:
filter <- matrix(1, nrow=3, ncol=3) 
r <- focal(d_m, filter, mean, na.rm=T, NAonly=T, pad=T)

#an alterative:
rp <- rasterToPoints(d_m)

library(fields)
# thin plate spline interpolation 
#(for a simple pattern like this, IDW might work, see ?interpolate)
tps <- Tps(rp[,1:2], rp[,3])
# predict
x <- interpolate(d_m, tps)
# use the orginal values where available
m <- cover(d_m, x)

i <- is.na(d_m)
cor(d[i], m[i])
## [1]  0.8846869
cor(d[i], r[i])
## [1] 0.8443165
Run Code Online (Sandbox Code Playgroud)