我这样做linear regression有StatsModels:
import numpy as np
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std
n = 100
x = np.linspace(0, 10, n)
e = np.random.normal(size=n)
y = 1 + 0.5*x + 2*e
X = sm.add_constant(x)
re = sm.OLS(y, X).fit()
print(re.summary())
prstd, iv_l, iv_u = wls_prediction_std(re)
Run Code Online (Sandbox Code Playgroud)
我的问题是,iv_l和iv_u为上,下置信区间或预测区间?
我如何得到别人?
我需要所有点的置信度和预测间隔,做一个情节.
我试图获得一些指数适合某些x,y数据的置信区间(此处可用).这是MWE我必须找到最适合数据的指数:
from pylab import *
from scipy.optimize import curve_fit
# Read data.
x, y = np.loadtxt('exponential_data.dat', unpack=True)
def func(x, a, b, c):
'''Exponential 3-param function.'''
return a * np.exp(b * x) + c
# Find best fit.
popt, pcov = curve_fit(func, x, y)
print popt
# Plot data and best fit curve.
scatter(x, y)
x = linspace(11, 23, 100)
plot(x, func(x, *popt), c='r')
show()
Run Code Online (Sandbox Code Playgroud)
产生:

如何在这个拟合上获得95%(或其他一些值)的置信区间,最好使用pure python,numpy或者scipy(我已经安装过的软件包)?
我有两个数据阵列,如高度和重量:
import numpy as np, matplotlib.pyplot as plt
heights = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65])
weights = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45])
plt.plot(heights,weights,'bo')
plt.show()
Run Code Online (Sandbox Code Playgroud)
我想制作类似于此的情节:
http://www.sas.com/en_us/software/analytics/stat.html#m=screenshot6
任何想法都表示赞赏.
我有一个示例时间序列数据框:
df = pd.DataFrame({'year':'1990','1991','1992','1993','1994','1995','1996',
'1997','1998','1999','2000'],
'count':[96,184,148,154,160,149,124,274,322,301,300]})
Run Code Online (Sandbox Code Playgroud)
我想要一条带乐队的linear regression线路。尽管我设法绘制了一条线性回归线。我发现很难在图中绘制置信区间带。这是我的线性回归图代码片段:confidence intervalregression line
from matplotlib import ticker
from sklearn.linear_model import LinearRegression
X = df.date_ordinal.values.reshape(-1,1)
y = df['count'].values.reshape(-1, 1)
reg = LinearRegression()
reg.fit(X, y)
predictions = reg.predict(X.reshape(-1, 1))
fig, ax = plt.subplots()
plt.scatter(X, y, color ='blue',alpha=0.5)
plt.plot(X, predictions,alpha=0.5, color = 'black',label = r'$N$'+ '= {:.2f}t + {:.2e}\n'.format(reg.coef_[0][0],reg.intercept_[0]))
plt.ylabel('count($N$)');
plt.xlabel(r'Year(t)');
plt.legend()
formatter = ticker.ScalarFormatter(useMathText=True)
formatter.set_scientific(True)
formatter.set_powerlimits((-1,1))
ax.yaxis.set_major_formatter(formatter)
plt.xticks(ticks = df.date_ordinal[::5], labels = df.index.year[::5])
plt.grid()
plt.show()
plt.clf()
Run Code Online (Sandbox Code Playgroud)
这给了我一个很好的时间序列线性回归图。
问题和所需的输出
但是,我也需要confidence interval …
python time-series matplotlib linear-regression scikit-learn
python ×3
matplotlib ×2
numpy ×2
scipy ×2
scikit-learn ×1
seaborn ×1
statistics ×1
statsmodels ×1
time-series ×1