普通PDF的积分不等于使用MATLAB的normpdf

Question

普通PDF的积分不等于使用MATLAB的normpdf

应该使用以下方法来计算PDF:

bins = [20 23 31.5 57 62.5 89 130];  % classes of values of my random variable

mean = 23;   
std  = mean/2;
values = mean + std*randn(1000,1);  % the actual values of the RV

% method 1

[num, bins] = hist(values, bins);  % histogram on the previously defined bins
pdf2 = num/trapz(bins, num);
trapz(bins, pdf2)  % checking the integral under the curve, which is indeed 1
ans =
 1

% method 2
pdf1 = normpdf(bins, mean, std); % the Matlab command for creating o normal PDF
trapz(bins, pdf1)  % this is NOT equal to 1
ans =
0.7069

Run Code Online (Sandbox Code Playgroud)

但是,如果我认为垃圾箱是这样的

bins = [0:46];

Run Code Online (Sandbox Code Playgroud)

结果是

ans =
 1
ans =
0.9544

Run Code Online (Sandbox Code Playgroud)

所以我仍然没有积分的值1 normpdf.

为什么不normpdf为PDF提供等于1的积分？上面的代码中是否有我遗漏的东西？

Answer 1

Hol*_*olt 5

问题是您缺少PDF中的大量值,如果采用bins = [0:46],则有以下曲线:

这意味着你丢失了所有的部分x < 0和x > 46,所以你计算积分不是来自-oo于+oo像您期望但从0至46,obvisouly你不会得到正确的答案.

请注意,你有mean = 23和std = 11.5,因此,如果你有bins = 0:46一个平均值围绕平均值,std每边宽度为1 ,那么根据68-95-99.7规则,95%的值位于此带内,这是一致的你得到0.9544.

如果你采取bins = -11.5:57.5,你现在每边有三个标准偏差,你将得到99.7%这个频段的值(MATLAB给我0.9973,填充区域是你没有的bins = 0:46):

请注意,如果要达到1.000误差优于10 ^-3,则需要大约3.4个标准偏差¹:

>> bins = (mean - 3.4 * std):(mean + 3.4 * std) ;
>> pdf  = normpdf(bins, mean, std) ;
>> trapz(bins, pdf)
0.9993

Run Code Online (Sandbox Code Playgroud)

请注意,使用时bins = [20 23 31.5 57 62.5 89 130];,您既有精度问题又有值缺失问题(您的曲线是蓝色曲线,红色曲线是使用生成的bins = 20:130):

显然,如果计算蓝色曲线下的面积,则不会得到红色曲线下面积的值,并且您肯定不会得到接近于-oo和之间的红色曲线积分的值+oo.

归档时间：	10 年，5 月前
查看次数：	651 次
最近记录：	10 年，5 月前