如何将个体测量不确定性纳入高斯过程？

Question

如何将个体测量不确定性纳入高斯过程？

kil*_*les 6 python uncertainty scikit-learn gaussian-process

我有一组观察值，f_i=f(x_i)并且我想构造一个概率代理，f(x) ~ N[mu(x), sigma(x)]其中N是正态分布。每个观察到的输出f_i与测量不确定度相关联sigma_i。我想将这些测量不确定性纳入我的替代项中f_i，以便mu(x)预测观测值，f_i(x_i)并且预测的标准差sigma(x_i)包含观测输出中的不确定性epsilon_i。

我能想到的实现这一目标的唯一方法是通过蒙特卡洛采样和高斯过程建模的结合。在没有蒙特卡洛样本的情况下，使用单个高斯过程来完成此任务是理想的，但我无法完成这项工作。

我展示了实现我的目标的三种尝试。前两个避免了蒙特卡罗采样，但不预测f(x_i)包含的不确定带的平均值epsilon(x_i)。第三种方法使用蒙特卡罗采样并完成我想做的事情。

有没有一种方法可以创建一个高斯过程，平均预测平均观测输出，并且不确定性将包含观测输出中的不确定性，而不使用这种蒙特卡罗方法？

import matplotlib.pyplot as plt
import numpy as np
import matplotlib
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, Matern, ExpSineSquared, WhiteKernel

# given a set of inputs, x_i, and corresponding outputs, f_i, I want to make a surrogate f(x). 
# each f_i is measured with a different instrument, that has a different uncertainty. 

# measured inputs
xs = np.array([44, 77, 125])

# measured outputs
fs = [8.64, 10.73, 12.13]

# uncertainty in measured outputs
errs = np.array([0.1, 0.2, 0.3])

# inputs to predict
finex = np.linspace(20, 200, 200)


#############
### approach 1: uncertainty in kernel
# - the kernel is constant and cannot change as a function of the input
# - uncertainty in measurements can be incorporated using a whitenoisekernel
# - the white noise uncertainty can be specified as the average of the observation error

# RBF + whitenoise kernel
kernel = 1 * RBF(length_scale=9, length_scale_bounds=(10, 1e3)) + WhiteKernel(errs.mean(), noise_level_bounds=(errs.mean() - 1e-8, errs.mean() + 1e-8))
gaussian_process = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9, normalize_y=True)
gaussian_process.fit((np.atleast_2d(xs).T), (fs))
mu, std = gaussian_process.predict((np.atleast_2d(finex).T), return_std=True)
plt.scatter(xs, fs, zorder=3, s=30)
plt.fill_between(finex, (mu - std), (mu + std), facecolor='grey')
plt.plot(finex, mu, c='w')
plt.errorbar(xs, fs, yerr=errs, ls='none')
plt.xlabel('input')
plt.ylabel('output')
plt.title('White Noise Kernel - assumes uniform sensor error')
plt.savefig('gp_whitenoise')
plt.clf()


####################
### Aproach 2: incorporate measurement uncertainty in the likelihood function
# - the likelihood function can be altered throught the alpha parameter
# - this assumes gaussian uncertainty in the measured input
kernel = 1 * RBF(length_scale=9, length_scale_bounds=(10, 1e3))
gaussian_process = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9, normalize_y=True, alpha=errs)
gaussian_process.fit((np.atleast_2d(xs).T), (fs))
mu, std = gaussian_process.predict((np.atleast_2d(finex).T), return_std=True)
plt.scatter(xs, fs, zorder=3, s=30)
plt.fill_between(finex, (mu - std), (mu + std), facecolor='grey')
plt.plot(finex, mu, c='w')
plt.errorbar(xs, fs, yerr=errs, ls='none')
plt.xlabel('input')
plt.ylabel('output')
plt.title('uncertainty in likelihood - assumes measurements may be innacruate')
plt.savefig('gp_alpha')
plt.clf()

####################
### Aproach 3: Monte Carlo of measurement uncertainty + GP
# - The Gaussian process represents uncertainty in creating the surrogate f(x)
# - The uncertainty in observed inputs can be propogated using Monte Carlo
# - downside: less computationally efficient, no analytic solution for mean or uncertainty
kernel = 1 * RBF(length_scale=9, length_scale_bounds=(10, 1e3))
posterior_history = np.zeros((finex.size, 100 * 50))
for sample in range(100):
   simulatedSamples = fs + np.random.normal(0, errs)
   gaussian_process = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9, normalize_y=True)
   gaussian_process.fit((np.atleast_2d(xs).T), (simulatedSamples))
   posterior_sample = gaussian_process.sample_y((np.atleast_2d(finex).T), 50)
   plt.plot(finex, posterior_sample, c='orange', alpha=0.005)
   posterior_history[:, sample * 50 : (sample + 1) * 50] = posterior_sample
plt.plot(finex, posterior_history.mean(1), c='w')
plt.fill_between(finex, posterior_history.mean(1) - posterior_history.std(1), posterior_history.mean(1) + posterior_history.std(1), facecolor='grey', alpha=1, zorder=5)
plt.scatter(xs, fs, zorder=6, s=30)
plt.errorbar(xs, fs, yerr=errs, ls='none', zorder=6)
plt.xlabel('input')
plt.ylabel('output')
plt.title('Monte Carlo + RBF Gaussian Process. Accurate but expensive.')
plt.savefig('gp_monteCarlo')
plt.clf()

Run Code Online (Sandbox Code Playgroud)

Answer 1

ama*_*nce 1

使用第二种方法，仅稍微改变 Alpha

kernel = 1 * RBF(length_scale=9, length_scale_bounds=(10, 1e3))
gaussian_process = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9, normalize_y=True, alpha=errs**2)
gaussian_process.fit((np.atleast_2d(xs).T), (fs))
mu, std = gaussian_process.predict((np.atleast_2d(finex).T), return_std=True)

Run Code Online (Sandbox Code Playgroud)

归档时间：	3 年，4 月前
查看次数：	764 次
最近记录：	3 年，4 月前