Ric*_*ton 4 statistics matlab r
用于生成对数正态分布随机数的大多数函数将相关正态分布的均值和标准差作为参数.
我的问题是我只知道对数正态分布的均值和变异系数.合理地直接从我所拥有的标准函数中获得所需的参数:
如果mu
且sigma
是相关正态分布的均值和标准差,我们就知道了
coeffOfVar^2 = variance / mean^2
= (exp(sigma^2) - 1) * exp(2*mu + sigma^2) / exp(mu + sigma^2/2)^2
= exp(sigma^2) - 1
Run Code Online (Sandbox Code Playgroud)
我们可以重新安排这个
sigma = sqrt(log(coeffOfVar^2 + 1))
Run Code Online (Sandbox Code Playgroud)
我们也知道
mean = exp(mu + sigma^2/2)
Run Code Online (Sandbox Code Playgroud)
这重新排列为
mu = log(mean) - sigma^2/2
Run Code Online (Sandbox Code Playgroud)
这是我的R实现
rlnorm0 <- function(mean, coeffOfVar, n = 1e6)
{
sigma <- sqrt(log(coeffOfVar^2 + 1))
mu <- log(mean) - sigma^2 / 2
rlnorm(n, mu, sigma)
}
Run Code Online (Sandbox Code Playgroud)
它适用于小的变异系数
r1 <- rlnorm0(2, 0.5)
mean(r1) # 2.000095
sd(r1) / mean(r1) # 0.4998437
Run Code Online (Sandbox Code Playgroud)
但不适合更大的价值观
r2 <- rlnorm0(2, 50)
mean(r2) # 2.048509
sd(r2) / mean(r2) # 68.55871
Run Code Online (Sandbox Code Playgroud)
为了检查它不是特定于R的问题,我在MATLAB中重新实现了它.(使用统计工具箱.)
function y = lognrnd0(mean, coeffOfVar, sizeOut)
if nargin < 3 || isempty(sizeOut)
sizeOut = [1e6 1];
end
sigma = sqrt(log(coeffOfVar.^2 + 1));
mu = log(mean) - sigma.^2 ./ 2;
y = lognrnd(mu, sigma, sizeOut);
end
r1 = lognrnd0(2, 0.5);
mean(r1) % 2.0013
std(r1) ./ mean(r1) % 0.5008
r2 = lognrnd0(2, 50);
mean(r2) % 1.9611
std(r2) ./ mean(r2) % 22.61
Run Code Online (Sandbox Code Playgroud)
同样的问题.问题是,为什么会发生这种情况?当变化范围很广时,标准偏差是不是很稳健?还是我搞砸了?
结果并不令人惊讶.对于具有大峰度的分布,样本方差的预期方差大致为mu4/N,其中mu4是分布的第4个时刻.对于对数正态,mu4指数地取决于参数sigma ^ 2,这意味着对于足够大的sigma值,样本方差将相对于真实方差在整个地方.这正是你所观察到的.在你的例子中,mu4/N~(coeffOfVar ^ 8)/ N~50 ^ 8/1e6~4e7.
用于推导样本变量的预期方差.请参阅http://mathworld.wolfram.com/SampleVarianceDistribution.html.下面是一些代码,以更精确的方式说明这些想法.注意样本方差的方差和理论预期值的大值,即使coeffOfVar = 5.
exp.var.of.samp.var <- function(n,mu2,mu4){
(n-1)*((n-1)*mu4-(n-3)*mu2^2)/n^3
}
mu2.lnorm <- function(mu,sigma){
(exp(sigma^2)-1)*exp(2*mu+sigma^2)
}
mu4.lnorm <- function(mu,sigma){
mu2.lnorm(mu,sigma)^2*(exp(4*sigma^2)+2*exp(3*sigma^2)+3*exp(2*sigma^2)-3)
}
exp.var.lnorm.var <- function(n,mu,sigma){
exp.var.of.samp.var(n,mu2.lnorm(mu,sigma),mu4.lnorm(mu,sigma))
}
exp.var.norm.var <- function(n,mu,sigma){
exp.var.of.samp.var(n,sigma^2,3*sigma^4)
}
coeffOfVar <- 5
mean <- 2
sigma <- sqrt(log(coeffOfVar^2 + 1)) # gives sigma=1.805020
mu <- log(mean) - sigma^2 / 2 # mu=-0.935901
n <- 1e4
m <- 1e4
## Get variance of sample variance for lognormal distribution:
var.trial <- replicate(m,var(rlnorm(n, mu, sigma)))
cat("samp. variance (mean of",m,"trials):",mean(var.trial),"\n")
cat("theor. variance:",mu2.lnorm(mu,sigma),"\n")
cat("variance of the sample var:",var(var.trial),"\n")
cat("expected variance of the sample var:",exp.var.lnorm.var(n,mu,sigma),"\n")
> samp. variance (mean of 10000 trials): 105.7192
> theor. variance: 100
> variance of the sample var: 350997.7
> expected variance of the sample var: 494053.2
## Do this with normal distribution:
var.trial <- replicate(m,var(rnorm(n, mu, sigma)))
cat("samp. variance (mean of",m,"trials):",mean(var.trial),"\n")
cat("theor. variance:",sigma^2,"\n")
cat("variance of the sample var:",var(var.trial),"\n")
cat("expected variance of the sample var:",exp.var.norm.var(n,mu,sigma),"\n")
> samp. variance (mean of 10000 trials): 3.257944
> theor. variance: 3.258097
> variance of the sample var: 0.002166131
> expected variance of the sample var: 0.002122826
Run Code Online (Sandbox Code Playgroud)