我正在为 CPUE 数据运行零膨胀模型。该数据有零通货膨胀的证据,我已通过 Vuong 测试(在下面的代码中)确认了这一点。根据 AIC 的说法,完整模型 (zint) 优于零模型。我现在想要:
我向该部门的几位统计学家寻求帮助(他们以前从未这样做过,并将我发送到相同的谷歌搜索网站),向统计部门本身(每个人都太忙)以及 stackoverflow feed 寻求帮助。
我很欣赏书籍的代码或指南(在线免费提供),其中包含使用偏移变量时处理可视化 ZIP 和模型拟合的代码。
yc=read.csv("CPUE_ycs_trawl_withcobb_BLS.csv",header=TRUE)
yc=yc[which(yc$countinyear<150),]
yc$fyear=as.factor(yc$year_cap)
yc$flocation=as.factor(yc$location)
hist(yc$countinyear,20)
yc$logoffset=log(yc$numtrawlyr)
###Run Zero-inflated poisson with offset for CPUE####
null <- formula(yc$countinyear ~ 1| 1)
znull <- zeroinfl(null, offset=logoffset,dist = "poisson",link = "logit",
data = yc)
int <- formula(yc$countinyear ~ assnage * spawncob| assnage * spawncob)
zint <- zeroinfl(int, offset=logoffset,dist = "poisson",link = "logit", data
= yc)
AIC(znull,zint)
g1=glm(countinyear ~ assnage * spawncob,
offset=logoffset,data=yc,family=poisson)
summary(g1)
####Vuong test to see if ZIP is even needed##
vuong(g1,zint)
##########DATASET###########
Run Code Online (Sandbox Code Playgroud)
countinyear 是第 1 列
##########DATASET###########
count assnage spawncob logoffset
56 0 0.32110173 2.833213
44 1 0.33712 2.833213
60 2 0.34053264 2.833213
0 4 0.19381496 2.833213
1 3 0.30819333 2.833213
33 0 0.32110173 2.833213
40 1 0.33712 2.833213
25 2 0.34053264 2.833213
0 3 0.30819333 2.833213
2 4 0.19381496 2.833213
6 0 0.32110173 2.833213
13 1 0.33712 2.833213
7 2 0.34053264 2.833213
0 3 0.30819333 2.833213
0 4 0.19381496 NA
5 0 0.32110173 2.833213
31 1 0.33712 2.833213
73 2 0.34053264 2.833213
0 3 0.30819333 2.833213
1 4 0.19381496 2.833213
0 0 0.32110173 2.833213
7 1 0.33712 2.833213
75 2 0.34053264 2.833213
3 3 0.30819333 2.833213
0 4 0.19381496 2.833213
19 0 0.32110173 2.833213
13 1 0.33712 2.833213
18 2 0.34053264 2.833213
0 3 0.30819333 2.833213
2 4 0.19381496 2.833213
11 0 0.32110173 2.833213
14 1 0.33712 2.833213
32 2 0.34053264 2.833213
1 3 0.30819333 2.833213
1 4 0.19381496 2.833213
12 0 0.32110173 2.833213
3 1 0.33712 2.833213
9 2 0.34053264 2.833213
2 3 0.30819333 2.833213
0 4 0.19381496 2.833213
5 0 0.32110173 2.833213
15 1 0.33712 2.833213
22 2 0.34053264 2.833213
5 3 0.30819333 2.833213
1 4 0.19381496 2.833213
1 0 0.32110173 2.833213
16 1 0.33712 2.833213
33 2 0.34053264 2.833213
4 3 0.30819333 2.833213
2 4 0.19381496 2.833213
6 0 0.32110173 2.833213
17 1 0.33712 2.833213
26 2 0.34053264 2.833213
1 3 0.30819333 2.833213
0 4 0.19381496 2.833213
16 0 0.32110173 2.833213
16 1 0.33712 2.833213
11 2 0.34053264 2.833213
1 3 0.30819333 2.833213
1 4 0.19381496 2.833213
2 0 0.32110173 2.833213
8 1 0.33712 2.833213
18 2 0.34053264 2.833213
0 3 0.30819333 2.833213
0 4 0.19381496 2.833213
2 0 0.32110173 2.833213
27 1 0.33712 2.833213
49 2 0.34053264 2.833213
1 3 0.30819333 2.833213
0 4 0.19381496 2.833213
1 0 0.32110173 2.833213
6 1 0.33712 2.833213
36 2 0.34053264 2.833213
17 3 0.30819333 2.833213
0 4 0.19381496 2.833213
10 0 0.32110173 2.833213
21 1 0.33712 2.833213
78 2 0.34053264 2.833213
32 3 0.30819333 2.833213
0 4 0.19381496 2.833213
0 0 0.32110173 2.833213
8 1 0.33712 2.833213
14 2 0.34053264 2.833213
7 3 0.30819333 2.833213
0 4 0.19381496 2.833213
0 1 0.13648433 2.833213
6 1 0.23952033 2.833213
12 2 0.32110173 2.833213
0 3 0.33712 2.833213
0 4 0.34053264 2.833213
30 0 0.13648433 2.833213
30 1 0.23952033 2.833213
25 2 0.32110173 2.833213
30 3 0.33712 2.833213
30 4 0.34053264 2.833213
68 0 0.13648433 2.833213
68 1 0.23952033 2.833213
55 2 0.32110173 2.833213
68 3 0.33712 2.833213
68 4 0.34053264 2.833213
0 0 0.13648433 2.833213
12 1 0.23952033 2.833213
26 2 0.32110173 2.833213
2 3 0.33712 2.833213
1 4 0.34053264 2.833213
0 0 0.13648433 2.833213
17 1 0.23952033 2.833213
36 2 0.32110173 2.833213
1 3 0.33712 2.833213
4 4 0.34053264 2.833213
1 0 0.13648433 2.833213
1 1 0.23952033 2.833213
4 2 0.32110173 2.833213
4 3 0.33712 2.833213
0 4 0.34053264 2.833213
3 0 0.13648433 2.833213
3 1 0.23952033 2.833213
3 2 0.32110173 2.833213
3 3 0.33712 2.833213
3 4 0.34053264 2.833213
0 0 0.13648433 2.833213
29 1 0.23952033 2.833213
33 2 0.32110173 2.833213
0 3 0.33712 2.833213
0 4 0.34053264 2.833213
0 0 0.13648433 2.833213
10 1 0.23952033 2.833213
7 2 0.32110173 2.833213
1 3 0.33712 2.833213
0 4 0.34053264 2.833213
0 0 0.13648433 2.833213
6 1 0.23952033 2.833213
18 2 0.32110173 2.833213
1 3 0.33712 2.833213
0 4 0.34053264 2.833213
0 0 0.13648433 2.833213
18 1 0.23952033 2.833213
37 2 0.32110173 2.833213
1 3 0.33712 2.833213
1 4 0.34053264 2.833213
0 0 0.13648433 2.833213
13 1 0.23952033 2.833213
26 2 0.32110173 2.833213
8 3 0.33712 2.833213
0 4 0.34053264 2.833213
0 0 0.13648433 2.833213
0 1 0.23952033 2.833213
1 2 0.32110173 2.833213
0 3 0.33712 2.833213
0 4 0.34053264 2.833213
0 0 0.13648433 2.833213
1 1 0.23952033 2.833213
5 2 0.32110173 2.833213
0 3 0.33712 2.833213
0 4 0.34053264 2.833213
0 0 0.13648433 2.833213
29 1 0.23952033 2.833213
15 2 0.32110173 2.833213
2 3 0.33712 2.833213
0 4 0.34053264 2.833213
0 0 0.13648433 2.833213
19 1 0.23952033 2.833213
25 2 0.32110173 2.833213
3 3 0.33712 2.833213
1 4 0.34053264 2.833213
0 0 0.13648433 2.833213
24 1 0.23952033 2.833213
40 2 0.32110173 2.833213
6 3 0.33712 2.833213
1 4 0.34053264 2.833213
0 0 0.03678637 2.772589
28 1 0.07414634 2.772589
28 2 0.13648433 2.772589
3 3 0.23952033 2.772589
2 4 0.32110173 2.772589
0 0 0.03678637 2.772589
3 1 0.07414634 2.772589
2 2 0.13648433 2.772589
0 3 0.23952033 2.772589
0 4 0.32110173 2.772589
4 0 0.03678637 2.772589
14 1 0.07414634 2.772589
6 2 0.13648433 2.772589
0 3 0.23952033 2.772589
0 4 0.32110173 2.772589
0 0 0.03678637 2.772589
6 1 0.07414634 2.772589
3 2 0.13648433 2.772589
2 3 0.23952033 2.772589
0 4 0.32110173 2.772589
0 0 0.03678637 2.772589
8 1 0.07414634 2.772589
2 2 0.13648433 2.772589
4 3 0.23952033 2.772589
1 4 0.32110173 2.772589
1 0 0.03678637 2.772589
12 1 0.07414634 2.772589
23 2 0.13648433 2.772589
0 3 0.23952033 2.772589
0 4 0.32110173 2.772589
0 0 0.03678637 2.772589
24 1 0.07414634 2.772589
56 2 0.13648433 2.772589
7 3 0.23952033 2.772589
4 4 0.32110173 2.772589
0 0 0.03678637 2.772589
22 1 0.07414634 2.772589
45 2 0.13648433 2.772589
3 3 0.23952033 2.772589
0 4 0.32110173 2.772589
0 0 0.03678637 2.772589
2 1 0.07414634 2.772589
18 2 0.13648433 2.772589
1 3 0.23952033 2.772589
0 4 0.32110173 2.772589
0 0 0.03678637 2.772589
5 1 0.07414634 2.772589
18 2 0.13648433 2.772589
5 3 0.23952033 2.772589
1 4 0.32110173 2.772589
0 0 0.03678637 2.772589
9 1 0.07414634 2.772589
25 2 0.13648433 2.772589
6 3 0.23952033 2.772589
1 4 0.32110173 2.772589
0 0 0.03678637 2.772589
1 1 0.07414634 2.772589
3 2 0.13648433 2.772589
1 3 0.23952033 2.772589
1 4 0.32110173 2.772589
0 0 0.03678637 2.772589
3 1 0.07414634 2.772589
16 2 0.13648433 2.772589
0 3 0.23952033 2.772589
0 4 0.32110173 2.772589
0 0 0.03678637 2.772589
7 1 0.07414634 2.772589
21 2 0.13648433 2.772589
8 3 0.23952033 2.772589
0 4 0.32110173 2.772589
0 0 0.03678637 2.772589
5 1 0.07414634 2.772589
22 2 0.13648433 2.772589
6 3 0.23952033 2.772589
0 4 0.32110173 2.772589
0 0 0.03678637 2.772589
11 1 0.07414634 2.772589
22 2 0.13648433 2.772589
6 3 0.23952033 2.772589
0 4 0.32110173 2.772589
1 0 0.11532605 2.564949
7 1 0.05628636 2.564949
11 2 0.03678637 2.564949
0 3 0.07414634 2.564949
0 4 0.13648433 2.564949
0 0 0.11532605 2.564949
4 1 0.05628636 2.564949
4 2 0.03678637 2.564949
0 3 0.07414634 2.564949
0 4 0.13648433 2.564949
0 0 0.11532605 2.564949
0 1 0.05628636 2.564949
5 2 0.03678637 2.564949
0 3 0.07414634 2.564949
1 4 0.13648433 2.564949
0 0 0.11532605 2.564949
3 1 0.05628636 2.564949
4 2 0.03678637 2.564949
0 3 0.07414634 2.564949
0 4 0.13648433 2.564949
0 0 0.11532605 2.564949
3 1 0.05628636 2.564949
0 2 0.03678637 2.564949
1 3 0.07414634 2.564949
0 4 0.13648433 2.564949
0 0 0.11532605 2.564949
1 1 0.05628636 2.564949
0 2 0.03678637 2.564949
0 3 0.07414634 2.564949
0 4 0.13648433 2.564949
0 0 0.11532605 2.564949
6 1 0.05628636 2.564949
9 2 0.03678637 2.564949
3 3 0.07414634 2.564949
0 4 0.13648433 2.564949
0 0 0.11532605 2.564949
3 1 0.05628636 2.564949
4 2 0.03678637 2.564949
3 3 0.07414634 2.564949
1 4 0.13648433 2.564949
0 0 0.11532605 2.564949
1 1 0.05628636 2.564949
3 2 0.03678637 2.564949
4 3 0.07414634 2.564949
0 4 0.13648433 2.564949
1 0 0.11532605 2.564949
3 1 0.05628636 2.564949
10 2 0.03678637 2.564949
2 3 0.07414634 2.564949
1 4 0.13648433 2.564949
0 0 0.11532605 2.564949
0 1 0.05628636 2.564949
3 2 0.03678637 2.564949
3 3 0.07414634 2.564949
1 4 0.13648433 2.564949
0 0 0.11532605 2.564949
24 1 0.05628636 2.564949
43 2 0.03678637 2.564949
11 3 0.07414634 2.564949
3 4 0.13648433 2.564949
0 0 0.11532605 2.564949
3 1 0.05628636 2.564949
19 2 0.03678637 2.564949
14 3 0.07414634 2.564949
2 4 0.13648433 2.564949
0 0 0.09016875 NA
25 1 0.14227471 2.833213
2 2 0.11532605 2.833213
0 3 0.05628636 2.833213
0 4 0.03678637 2.833213
0 0 0.09016875 2.833213
14 1 0.14227471 2.833213
0 2 0.11532605 2.833213
0 3 0.05628636 2.833213
0 4 0.03678637 2.833213
0 0 0.09016875 2.833213
12 1 0.14227471 2.833213
4 2 0.11532605 2.833213
0 3 0.05628636 2.833213
0 4 0.03678637 2.833213
1 0 0.09016875 2.833213
42 1 0.14227471 2.833213
20 2 0.11532605 2.833213
1 3 0.05628636 2.833213
2 4 0.03678637 2.833213
0 0 0.09016875 2.833213
48 1 0.14227471 2.833213
40 2 0.11532605 2.833213
1 3 0.05628636 2.833213
0 4 0.03678637 2.833213
10 0 0.09016875 2.833213
23 2 0.11532605 2.833213
0 3 0.05628636 2.833213
2 4 0.03678637 2.833213
2 0 0.09016875 2.833213
89 1 0.14227471 2.833213
5 2 0.11532605 2.833213
1 3 0.05628636 2.833213
6 4 0.03678637 2.833213
0 0 0.09016875 2.833213
27 1 0.14227471 2.833213
9 2 0.11532605 2.833213
3 3 0.05628636 2.833213
2 4 0.03678637 2.833213
1 0 0.09016875 2.833213
6 1 0.14227471 2.833213
0 2 0.11532605 2.833213
1 3 0.05628636 2.833213
0 4 0.03678637 2.833213
0 0 0.09016875 2.833213
65 1 0.14227471 2.833213
35 2 0.11532605 2.833213
1 3 0.05628636 2.833213
2 4 0.03678637 2.833213
0 0 0.09016875 2.833213
29 1 0.14227471 2.833213
26 2 0.11532605 2.833213
3 3 0.05628636 2.833213
1 4 0.03678637 2.833213
4 0 0.09016875 2.833213
105 1 0.14227471 2.833213
5 2 0.11532605 2.833213
0 3 0.05628636 2.833213
1 4 0.03678637 2.833213
4 0 0.09016875 2.833213
107 1 0.14227471 2.833213
5 2 0.11532605 2.833213
0 3 0.05628636 2.833213
0 4 0.03678637 2.833213
0 0 0.09016875 2.833213
17 1 0.14227471 2.833213
1 2 0.11532605 2.833213
0 3 0.05628636 2.833213
0 4 0.03678637 2.833213
3 0 0.09016875 2.833213
106 1 0.14227471 2.833213
1 2 0.11532605 2.833213
1 3 0.05628636 2.833213
0 4 0.03678637 2.833213
0 0 0.09016875 2.833213
21 1 0.14227471 2.833213
14 2 0.11532605 2.833213
5 3 0.05628636 2.833213
1 4 0.03678637 2.833213
0 0 0.09016875 2.833213
35 1 0.14227471 2.833213
12 2 0.11532605 2.833213
8 3 0.05628636 2.833213
2 4 0.03678637 2.833213
4 0 0.13510174 1.791759
1 1 0.10188844 1.791759
4 2 0.09016875 1.791759
0 3 0.14227471 1.791759
0 4 0.11532605 1.791759
3 0 0.13510174 1.791759
16 1 0.10188844 1.791759
11 2 0.09016875 1.791759
0 3 0.14227471 1.791759
0 4 0.11532605 1.791759
4 0 0.13510174 1.791759
20 1 0.10188844 1.791759
7 2 0.09016875 1.791759
0 3 0.14227471 1.791759
0 4 0.11532605 1.791759
0 0 0.13510174 1.791759
3 1 0.10188844 1.791759
1 2 0.09016875 1.791759
1 3 0.14227471 1.791759
1 4 0.11532605 1.791759
0 0 0.13510174 1.791759
2 1 0.10188844 1.791759
8 2 0.09016875 1.791759
2 3 0.14227471 1.791759
1 4 0.11532605 1.791759
0 0 0.13510174 1.791759
1 1 0.10188844 1.791759
40 2 0.09016875 1.791759
8 3 0.14227471 1.791759
0 4 0.11532605 1.791759
0 0 0.33638851 2.70805
0 1 0.20354567 2.70805
18 2 0.13510174 2.70805
2 3 0.10188844 2.70805
0 4 0.09016875 2.70805
0 0 0.33638851 2.70805
0 1 0.20354567 2.70805
1 2 0.13510174 2.70805
0 3 0.10188844 2.70805
0 4 0.09016875 2.70805
0 0 0.33638851 2.70805
1 1 0.20354567 2.70805
1 2 0.13510174 2.70805
0 3 0.10188844 2.70805
0 4 0.09016875 2.70805
0 0 0.33638851 2.70805
13 1 0.20354567 2.70805
23 2 0.13510174 2.70805
1 3 0.10188844 2.70805
13 4 0.09016875 2.70805
0 0 0.33638851 2.70805
1 1 0.20354567 2.70805
8 2 0.13510174 2.70805
3 3 0.10188844 2.70805
4 4 0.09016875 2.70805
0 0 0.33638851 2.70805
2 1 0.20354567 2.70805
9 2 0.13510174 2.70805
2 3 0.10188844 2.70805
0 4 0.09016875 2.70805
26 0 0.33638851 2.70805
2
为了可视化概率回归模型的拟合优度,“标准”残差(例如,泊松或偏差)通常信息量不大,因为它们主要捕获均值的建模,而不是整个分布的建模。有时使用的一种替代方法是(随机)分位数残差。在没有随机化的情况下,它们被定义为qnorm(pdist(y))其中pdist()是拟合分布函数(此处为 ZIP 模型),y是观测值,qnorm()是标准正态分布的分位数函数。如果模型拟合,残差的分布应为标准正态分布,并且可以在 QQ 图中检查。在离散分布的情况下(如此处),需要随机化来打破数据的离散性质。有关详细信息,请参阅 Dunn & Smyth(1996 年,《计算与图形统计杂志》,5,236-244 )。在 R 中,您可以使用countregR-Forge 中的软件包(希望很快也能在 CRAN 上)来实现这些。
检查数据边际分布的另一种替代方法是所谓的根图。它直观地比较计数 0、1、... 的观测频率和拟合频率。与随机分位数残差的 QQ 图相比,它通常更能显示过多零和/或过度分散的问题。有关更多详细信息,请参阅我们的论文 Kleiber & Zeileis (2016, The American Statistician , 70 (3), 296\xe2\x80\x93303, doi:10.1080/00031305.2016.1173590 )。
\n\n将这些应用到您的回归模型中,很快就会发现零膨胀泊松没有考虑响应中的过度离散。(当计数达到或超过 100 时,基于泊松的分布几乎永远不会拟合得很好。)此外,零通胀模型不太拟合,因为对于assnage= 1 和 = 2,零很少,不需要零通胀。这导致零膨胀部分中的相应系数具有-Inf非常大的标准误差(例如二元回归中的准分离)。因此,两部分障碍模型更适合并且可能更容易解释。最后,由于两组assnage不同,我将编码assnage作为一个因素(我不清楚你是否已经这样做了)。
因此,为了分析您的数据,我使用yc您帖子中提供的数据并确保:
yc$assnage <- factor(yc$assnage)\nRun Code Online (Sandbox Code Playgroud)\n\n为了第一次探索性地观察 的影响,assnage我绘制了是否为正值(左:零障碍)和对数刻度上的count正值(右:计数)。count
plot(factor(count > 0, levels = c(FALSE, TRUE), labels = c("=0", ">0")) ~ assnage,\n data = yc, ylab = "count", main = "Zero hurdle")\nplot(count ~ assnage, data = yc, subset = count > 0,\n log = "y", main = "Count (positive)")\nRun Code Online (Sandbox Code Playgroud)\n\n\n\n然后,我使用 R-Forge 的软件包安装 ZIP、ZINB 和 hurdle NB 模型countreg。zeroinfl()这还包含和函数的更新版本hurdle()。
install.packages("countreg", repos = "http://R-Forge.R-project.org")\nlibrary("countreg")\nzip <- zeroinfl(count ~ assnage * spawncob, offset = logoffset,\n data = yc, dist = "poisson")\nzinb <- zeroinfl(count ~ assnage * spawncob, offset = logoffset,\n data = yc, dist = "negbin")\nhnb <- hurdle(count ~ assnage * spawncob, offset = logoffset, data = yc,\n dist = "negbin")\nRun Code Online (Sandbox Code Playgroud)\n\nZIP显然不合适,跨栏NB比ZINB稍好一些。
\n\nBIC(zip, zinb, hnb)\n## df BIC\n## zip 20 7700.085\n## zinb 21 3574.720\n## hnb 21 3556.693\nRun Code Online (Sandbox Code Playgroud)\n\n如果您检查,summary(zinb)您还会发现零通货膨胀部分中的某些系数约为 10(对于虚拟变量),标准误差大一两个数量级。这本质上意味着相应组中的零膨胀概率为零,因为负二项式分布已经具有足够的零响应概率权重(assnage组 1 和组 2)。
为了可视化 ZIP 模型不适合而 HNB 适当捕获响应,我们现在可以使用根图。
\n\nrootogram(zip, main = "ZIP", ylim = c(-5, 15), max = 50)\nrootogram(hnb, main = "HNB", ylim = c(-5, 15), max = 50)\nRun Code Online (Sandbox Code Playgroud)\n\n\n\nZIP 的波形图案清楚地显示了模型未正确捕获的数据的过度分散。相比之下,这个障碍相当合适。
\n\n作为最后的检查,我们还可以查看障碍模型中分位数残差的 QQ 图。这些看起来相当正常,并且与模型没有任何可疑的偏差。
\n\nqqrplot(hnb, main = "HNB")\nRun Code Online (Sandbox Code Playgroud)\n\n\n\n由于残差是随机的,您可以重新运行代码几次以获得变化的印象。qqrplot()还有一些参数可以让您在单个图中探索这种变化。