Sha*_*hen 5 r linear-regression lm
如何自动提取曲线中R^2对整个曲线不理想的拟合良好的线性部分?
例如我有什么:
数据.lm
x y
1 1 1
2 2 8
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 5
9 9 2
10 10 7
Run Code Online (Sandbox Code Playgroud)
rg.lm<- lm(y~x, data.lm) rg.lm
Coefficients:
(Intercept) x
3.7333 0.1939
Run Code Online (Sandbox Code Playgroud)
摘要(rg.lm)
Residuals:
Min 1Q Median 3Q Max
-3.4788 -1.1136 0.0061 1.2712 3.8788
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.7333 1.6111 2.317 0.0491 *
x 0.1939 0.2597 0.747 0.4765
Residual standard error: 2.358 on 8 degrees of freedom
Multiple R-squared: 0.06519, Adjusted R-squared: -0.05166
F-statistic: 0.5579 on 1 and 8 DF, p-value: 0.4765
Run Code Online (Sandbox Code Playgroud)
我的期望:
data.lm.ex<- 未知函数 (data.lm) data.lm.ex
x y
1 3 3
2 4 4
3 5 5
4 6 6
7 7 7
Run Code Online (Sandbox Code Playgroud)
另一个例子来自真实数据:
数据.lm
time OD
1 0 2.175
2 30 2.134
3 60 2.189
4 90 2.141
5 120 2.854
6 150 3.331
7 180 3.642
8 210 4.333
9 240 4.987
10 270 5.093
11 300 4.943
12 330 5.198
13 360 4.804
Run Code Online (Sandbox Code Playgroud)
摘要(lm(data.lm))$r.squared
[1] 0.8981063
Run Code Online (Sandbox Code Playgroud)
摘要(lm(data.lm[4:9,]))$r.squared
[1] 0.9886727
Run Code Online (Sandbox Code Playgroud)
如上所示,第4行到第9行之间的间隔绝对比整条曲线的r^2要高。您能否让我知道自动找到呈现最高 r^2 且至少具有一定数量点的区间的方法(由于 2 个点总是呈现 r^2=1.0)?
这应该有效:
a <- cbind(1:10, c(1,8,3:7,5,2,7))
tmp <- rle(diff(a[,2]))
ml <- max(tmp$lengths)
i1 <- which(ml==tmp$lengths)[1]
a[seq(i1,i1+ml),]
Run Code Online (Sandbox Code Playgroud)
更新
a <- data.frame(x=c(0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360),
y=c(2.175, 2.134, 2.189, 2.141, 2.854, 3.331, 3.642, 4.333, 4.987, 5.093, 4.943, 5.198, 4.804))
b <- diff(a[,2])/diff(a[,1])
b.k <- kmeans(b,3)
b.max <- max(abs(b.k$centers))
b.v <- which(b.k$cluster == match(b.max, b.k$centers))
RES <- a[b.v,]
plot(a)
points(RES,pch=15)
abline(coef(lm(y~x,RES)), col="red")
Run Code Online (Sandbox Code Playgroud)
精炼版:
library(zoo)
a <- data.frame(x=c(0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360),
y=c(2.175, 2.134, 2.189, 2.141, 2.854, 3.331, 3.642, 4.333, 4.987, 5.093, 4.943, 5.198, 4.804))
f <- function (d) {
m <- lm(y~x, as.data.frame(d))
return(coef(m)[2])
}
co <- rollapply(a, 3, f, by.column=F)
co.cl <- kmeans(co, 2)
b.points <- which(co.cl$cluster == match(max(co.cl$centers), co.cl$centers))+1
RES <- a[b.points,]
plot(a)
points(RES,pch=15,col="red")
abline(lm(y~x,RES),col="blue")
Run Code Online (Sandbox Code Playgroud)
[![改进版本]](https://i.stack.imgur.com/LxqFf.png)