75 python numpy matplotlib curve-fitting linear-regression
我试图产生对我已经产生了散点图的线性回归,但是我的数据是在列表格式,所有的例子我能找到使用的polyfit要求使用arange.arange虽然不接受名单.我已经搜索了如何将列表转换为数组的高低,似乎没有什么是清楚的.我错过了什么吗?
接下来,我如何才能最好地使用整数列表作为输入polyfit?
这是我遵循的polyfit示例:
from pylab import * 
x = arange(data) 
y = arange(data) 
m,b = polyfit(x, y, 1) 
plot(x, y, 'yo', x, m*x+b, '--k') 
show() 
Run Code Online (Sandbox Code Playgroud)
    DSM*_*DSM 162
arange 生成列表(好吧,numpy数组); 输入help(np.arange)详细信息.您无需在现有列表上调用它.
>>> x = [1,2,3,4]
>>> y = [3,5,7,9] 
>>> 
>>> m,b = np.polyfit(x, y, 1)
>>> m
2.0000000000000009
>>> b
0.99999999999999833
Run Code Online (Sandbox Code Playgroud)
我应该补充一点,我倾向于使用poly1d这里而不是写出"m*x + b"和更高阶的等价物,所以我的代码版本看起来像这样:
import numpy as np
import matplotlib.pyplot as plt
x = [1,2,3,4]
y = [3,5,7,10] # 10, not 9, so the fit isn't perfect
coef = np.polyfit(x,y,1)
poly1d_fn = np.poly1d(coef) 
# poly1d_fn is now a function which takes in x and returns an estimate for y
plt.plot(x,y, 'yo', x, poly1d_fn(x), '--k')
plt.xlim(0, 5)
plt.ylim(0, 12)
Run Code Online (Sandbox Code Playgroud)
        Geo*_*lis 35
这段代码:
from scipy.stats import linregress
linregress(x,y) #x and y are arrays or lists.
Run Code Online (Sandbox Code Playgroud)
给出一个包含以下内容的列表:
斜率:浮法
回归直线的斜率
截距:浮法
回归直线的截距
浮动:r值
相关系数
p值:浮动
双面p值用于假设检验,其零假设是斜率为零
标准错误:浮
估计的标准误差
tdy*_*tdy 18
用于statsmodels.api.OLS获取拟合/系数/残差的详细分类:
import statsmodels.api as sm
df = sm.datasets.get_rdataset('Duncan', 'carData').data
y = df['income']
x = df['education']
model = sm.OLS(y, sm.add_constant(x))
results = model.fit()
print(results.params)
# const        10.603498 <- intercept
# education     0.594859 <- slope
# dtype: float64
print(results.summary())
#                             OLS Regression Results                            
# ==============================================================================
# Dep. Variable:                 income   R-squared:                       0.525
# Model:                            OLS   Adj. R-squared:                  0.514
# Method:                 Least Squares   F-statistic:                     47.51
# Date:                Thu, 28 Apr 2022   Prob (F-statistic):           1.84e-08
# Time:                        00:02:43   Log-Likelihood:                -190.42
# No. Observations:                  45   AIC:                             384.8
# Df Residuals:                      43   BIC:                             388.5
# Df Model:                           1                                         
# Covariance Type:            nonrobust                                         
# ==============================================================================
#                  coef    std err          t      P>|t|      [0.025      0.975]
# ------------------------------------------------------------------------------
# const         10.6035      5.198      2.040      0.048       0.120      21.087
# education      0.5949      0.086      6.893      0.000       0.421       0.769
# ==============================================================================
# Omnibus:                        9.841   Durbin-Watson:                   1.736
# Prob(Omnibus):                  0.007   Jarque-Bera (JB):               10.609
# Skew:                           0.776   Prob(JB):                      0.00497
# Kurtosis:                       4.802   Cond. No.                         123.
# ==============================================================================
Run Code Online (Sandbox Code Playgroud)
要绘制最佳拟合线,只需将斜率m和截距传递b到新的plt.axline:
import matplotlib.pyplot as plt
# extract intercept b and slope m
b, m = results.params
# plot y = m*x + b
plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
Run Code Online (Sandbox Code Playgroud)
请注意,可以从任何常见的回归方法中轻松提取斜率m和截距:b
import numpy as np
m, b = np.polyfit(x, y, deg=1)
plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
Run Code Online (Sandbox Code Playgroud)
from scipy import stats
m, b, *_ = stats.linregress(x, y)
plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
Run Code Online (Sandbox Code Playgroud)
import statsmodels.api as sm
b, m = sm.OLS(y, sm.add_constant(x)).fit().params
plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
Run Code Online (Sandbox Code Playgroud)
sklearn.linear_model.LinearRegression
from sklearn.linear_model import LinearRegression
reg = LinearRegression().fit(x[:, None], y)
b = reg.intercept_
m = reg.coef_[0]
plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
Run Code Online (Sandbox Code Playgroud)
小智 8
import numpy as np
import matplotlib.pyplot as plt 
from scipy import stats
x = np.array([1.5,2,2.5,3,3.5,4,4.5,5,5.5,6])
y = np.array([10.35,12.3,13,14.0,16,17,18.2,20,20.7,22.5])
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
mn=np.min(x)
mx=np.max(x)
x1=np.linspace(mn,mx,500)
y1=gradient*x1+intercept
plt.plot(x,y,'ob')
plt.plot(x1,y1,'-r')
plt.show()
Run Code Online (Sandbox Code Playgroud)
用这个 ..
小智 6
George 的答案与 matplotlib绘制无限直线的axline非常吻合。
from scipy.stats import linregress
import matplotlib.pyplot as plt
reg = linregress(x, y)
plt.axline(xy1=(0, reg.intercept), slope=reg.slope, linestyle="--", color="k")
Run Code Online (Sandbox Code Playgroud)