Ric*_*ard 6 interaction regression
我们先来看一下lm。我有一个连续的解释性 $X$ 和一个因子 $F$ 来建模季节性方面(在示例中为 8 个级别)。
让 $\\beta$ 表示 $X$ 的斜率,然后我想对斜率与因子的相互作用进行建模。这是某种物理模型,因此我假设交互作用仅对 8 个级别中的 2 个级别有意义。\n如何表达?我想使用一个普通的公式,因为稍后我想将其放入AER包(函数tobit)中的审查回归中
数据是:
\n\nN = 50\nf = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)\nfcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)\nbeta = rep(c(5,5,5,8,4,5,5,5),N)\nset.seed(100) \nx = rnorm(8*N)+1\nepsilon = rnorm(8*N,sd = sqrt(1/5))\ny = x*beta+fcoeff+epsilon\nRun Code Online (Sandbox Code Playgroud)\n\n与所有相互作用的拟合给出了准确的结果
\n\nfit <- lm(y~0+x+x*f)\nsummary(fit)\n\nCall:\nlm(formula = y ~ 0 + x + x * f)\n\nResiduals:\n Min 1Q Median 3Q Max \n-1.41018 -0.30296 0.01818 0.32657 1.20677 \n\nCoefficients:\n Estimate Std. Error t value Pr(>|t|) \nx 5.039064 0.075818 66.463 <2e-16 ***\nfs1 -0.945112 0.088072 -10.731 <2e-16 ***\nfs2 -2.107483 0.103590 -20.344 <2e-16 ***\nfs3 -2.992401 0.088164 -33.941 <2e-16 ***\nfs4 -4.054411 0.094878 -42.733 <2e-16 ***\nfs5 -2.730448 0.094815 -28.798 <2e-16 ***\nfs6 -5.232721 0.102254 -51.174 <2e-16 ***\nfs7 -9.969175 0.096307 -103.515 <2e-16 ***\nfs8 -4.922782 0.092917 -52.980 <2e-16 ***\nx:fs2 -0.006081 0.097748 -0.062 0.950 \nx:fs3 -0.050684 0.102124 -0.496 0.620 \nx:fs4 2.988702 0.103652 28.834 <2e-16 ***\nx:fs5 -1.196775 0.105139 -11.383 <2e-16 ***\nx:fs6 0.099112 0.103811 0.955 0.340 \nx:fs7 -0.007648 0.110908 -0.069 0.945 \nx:fs8 -0.107148 0.094346 -1.136 0.257 \n---\nSignif. codes: 0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1\n\nResidual standard error: 0.4705 on 384 degrees of freedom\nMultiple R-squared: 0.9942, Adjusted R-squared: 0.994 \nF-statistic: 4120 on 16 and 384 DF, p-value: < 2.2e-16\nRun Code Online (Sandbox Code Playgroud)\n\ns4我如何才能模拟与和仅的交互s5?我可以从拟合中删除其他交互作用以进行进一步预测吗?
我尝试将因子一分为二,但模型变得奇异:
\n\nf = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)\nfcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)\nf2 = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)\nf[f %in% c("s4","s5")] <- "no.inter"\nf2[f2 %in% c("s1","s2","s3","s6","s7","s8")] <- "rest"\n\nfit <- lm(y~0+x+x*f2+ f)\nsummary(fit)\n\nCall:\nlm(formula = y ~ 0 + x + x * f2 + f)\n\nResiduals:\n Min 1Q Median 3Q Max \n-1.41018 -0.31544 0.00653 0.31615 1.20670 \n\nCoefficients: (1 not defined because of singularities)\n Estimate Std. Error t value Pr(>|t|) \nx 5.01794 0.02756 182.106 <2e-16 ***\nf2rest -5.02213 0.07381 -68.045 <2e-16 ***\nf2s4 -4.05441 0.09495 -42.702 <2e-16 ***\nf2s5 -2.73045 0.09488 -28.777 <2e-16 ***\nfs1 4.09310 0.09480 43.177 <2e-16 ***\nfs2 2.93401 0.09424 31.132 <2e-16 ***\nfs3 2.00475 0.09456 21.201 <2e-16 ***\nfs6 -0.07894 0.09419 -0.838 0.402 \nfs7 -4.93545 0.09452 -52.213 <2e-16 ***\nfs8 NA NA NA NA \nx:f2s4 3.00983 0.07591 39.651 <2e-16 ***\nx:f2s5 -1.17565 0.07793 -15.086 <2e-16 ***\n---\nSignif. codes: 0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1\n\nResidual standard error: 0.4709 on 389 degrees of freedom\nMultiple R-squared: 0.9941, Adjusted R-squared: 0.994 \nF-statistic: 5983 on 11 and 389 DF, p-value: < 2.2e-16\nRun Code Online (Sandbox Code Playgroud)\n
最简单的方法可能是操作模型矩阵以删除不需要的列:
xx <- model.matrix(y ~ 0 + x + x*f)
omit <- grep("[:]fs[^45]", colnames(xx))
xx <- xx[, -omit]
lm(y ~ 0 + xx)
Run Code Online (Sandbox Code Playgroud)
输出:
Call:
lm(formula = y ~ 0 + xx)
Coefficients:
xxx xxfs1 xxfs2 xxfs3 xxfs4 xxfs5 xxfs6 xxfs7 xxfs8 xxx:fs4 xxx:fs5
5.018 -0.929 -2.088 -3.017 -4.054 -2.730 -5.101 -9.958 -5.022 3.010 -1.176
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2474 次 |
| 最近记录: |