如何使用 scipy.optimize.least_squares 计算标准偏差误差

Question

如何使用 scipy.optimize.least_squares 计算标准偏差误差

我将拟合与 optimize.curve_fit 和 optimize.least_squares 进行比较。使用曲线拟合，我将协方差矩阵 pcov 作为输出，我可以通过以下方式计算拟合变量的标准偏差误差：

perr = np.sqrt(np.diag(pcov))

Run Code Online (Sandbox Code Playgroud)

如果我使用least_squares 进行拟合，则不会得到任何协方差矩阵输出，并且无法计算变量的标准偏差误差。

这是我的例子：

#import modules
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import least_squares

noise = 0.5
N = 100
t = np.linspace(0, 4*np.pi, N)

# generate data
def generate_data(t, freq, amplitude, phase, offset, noise=0, n_outliers=0, random_state=0):
    #formula for data generation with noise and outliers
    y = np.sin(t * freq + phase) * amplitude + offset
    rnd = np.random.RandomState(random_state)
    error = noise * rnd.randn(t.size)
    outliers = rnd.randint(0, t.size, n_outliers)
    error[outliers] *= 10
    return y + error

#generate data
data = generate_data(t, 1, 3, 0.001, 0.5, noise, n_outliers=10)

#initial guesses
p0=np.ones(4)
x0=np.ones(4)

# create the function we want to fit
def my_sin(x, freq, amplitude, phase, offset):
    return np.sin(x * freq + phase) * amplitude + offset

# create the function we want to fit for least-square
def my_sin_lsq(x, t, y):
    # freq=x[0]
    # phase=x[1]
    # amplitude=x[2]
    # offset=x[3]
    return (np.sin(t*x[0]+x[2])*x[1]+ x[3]) - y

# now do the fit for curve_fit
fit = curve_fit(my_sin, t, data, p0=p0)
print 'Curve fit output:'+str(fit[0])

#now do the fit for least_square
res_lsq = least_squares(my_sin_lsq, x0, args=(t, data))
print 'Least_squares output:'+str(res_lsq.x)


# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = my_sin(t, *p0)

#data_first_guess_lsq = x0[2]*np.sin(t*x0[0]+x0[1])+x0[3]
data_first_guess_lsq = my_sin(t, *x0)

# recreate the fitted curve using the optimized parameters
data_fit = my_sin(t, *fit[0])
data_fit_lsq = my_sin(t, *res_lsq.x)

#calculation of residuals
residuals = data - data_fit
residuals_lsq = data - data_fit_lsq
ss_res = np.sum(residuals**2)
ss_tot = np.sum((data-np.mean(data))**2)
ss_res_lsq = np.sum(residuals_lsq**2)
ss_tot_lsq = np.sum((data-np.mean(data))**2)

#R squared
r_squared = 1 - (ss_res/ss_tot)
r_squared_lsq = 1 - (ss_res_lsq/ss_tot_lsq)
print 'R squared curve_fit is:'+str(r_squared)
print 'R squared least_squares is:'+str(r_squared_lsq)

plt.figure()
plt.plot(t, data)
plt.title('curve_fit')
plt.plot(t, data_first_guess)
plt.plot(t, data_fit)
plt.plot(t, residuals)

plt.figure()
plt.plot(t, data)
plt.title('lsq')
plt.plot(t, data_first_guess_lsq)
plt.plot(t, data_fit_lsq)
plt.plot(t, residuals_lsq)

#error
perr = np.sqrt(np.diag(fit[1]))
print 'The standard deviation errors for curve_fit are:' +str(perr)

Run Code Online (Sandbox Code Playgroud)

我将非常感谢任何帮助，最良好的祝愿

ps：我从这个来源得到了很多输入，并使用了部分代码稳健回归

Answer 1

小智 7

optimize.least_squares 的结果内部有一个名为 jac 的参数。从文档：

jac : ndarray, 稀疏矩阵或 LinearOperator, 形状 (m, n)

解决方案处的修正雅可比矩阵，从某种意义上说，J^TJ 是成本函数 Hessian 的高斯-牛顿近似。类型与算法使用的类型相同。

这可用于使用以下公式估计参数的协方差矩阵：Sigma = (J'J)^-1。

J = res_lsq.jac
cov = np.linalg.inv(J.T.dot(J))

Run Code Online (Sandbox Code Playgroud)

要找到参数的方差，然后可以使用：

var = np.sqrt(np.diagonal(cov))

Run Code Online (Sandbox Code Playgroud)

据我所知，这个答案包含两个错误：1.协方差矩阵需要乘以残差的RMS。2.最终显示的结果是标准差，而不是方差。请参阅：/sf/answers/1529130851/ 和 /sf/answers/1040020901/ (4认同)

Answer 2

div*_*nex 6

SciPy 程序optimize.least_squares要求用户在输入中提供一个函数fun(...)返回残差向量的函数。这通常定义为

\n

residuals = (data - model)/sigma\n

Run Code Online (Sandbox Code Playgroud)\n

其中data和model是包含要拟合的数据的向量以及每个数据点的相应模型预测，而sigma是每个数据点的 1\xcf\x83 不确定性data。

\n

在这种情况下，假设可以信任输入的不确定性，则可以使用返回的sigma输出雅可比矩阵来估计协方差矩阵。此外，假设协方差矩阵是对角矩阵，或者简单地忽略非对角项，还可以获得模型参数中的 1\xcf\x83 不确定性（通常称为“形式误差”），如下所示（参见《数值计算》第 15.4.2 节）食谱第三版。jacleast_squaresperr）

\n

import numpy as np\nfrom scipy import linalg, optimize\n\nres = optimize.least_squares(...)\n\nU, s, Vh = linalg.svd(res.jac, full_matrices=False)\ntol = np.finfo(float).eps*s[0]*max(res.jac.shape)\nw = s > tol\ncov = (Vh[w].T/s[w]**2) @ Vh[w]  # robust covariance matrix\nperr = np.sqrt(np.diag(cov))     # 1sigma uncertainty on fitted parameters\n

Run Code Online (Sandbox Code Playgroud)\n

上面获取协方差矩阵的代码在形式上与下面更简单的代码相同（如 Alex 所建议），但上面的主要优点是，即使雅可比行列式接近简并（这在现实中很常见），它也能工作。 - 世界最小二乘拟合

\n

cov = linalg.inv(res.jac.T @ res.jac)  # covariance matrix when jac not degenerate\n

Run Code Online (Sandbox Code Playgroud)\n

如果不相信输入的不确定性sigma，仍然可以假设拟合良好，从拟合本身估计数据的不确定性。这对应于假设chi**2/DOF=1，其中DOF是自由度数。在这种情况下，可以在计算不确定性之前使用以下几行重新调整协方差矩阵

\n

chi2dof = np.sum(res.fun**2)/(res.fun.size - res.x.size)\ncov *= chi2dof\nperr = np.sqrt(np.diag(cov))    # 1sigma uncertainty on fitted parameters\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	8 年，9 月前
查看次数：	4405 次
最近记录：	4 年，3 月前