使用matplotlib/numpy进行线性回归

75 python numpy matplotlib curve-fitting linear-regression

我试图产生对我已经产生了散点图的线性回归,但是我的数据是在列表格式,所有的例子我能找到使用的polyfit要求使用arange.arange虽然不接受名单.我已经搜索了如何将列表转换为数组的高低,似乎没有什么是清楚的.我错过了什么吗?

接下来,我如何才能最好地使用整数列表作为输入polyfit

这是我遵循的polyfit示例:

from pylab import * 

x = arange(data) 
y = arange(data) 

m,b = polyfit(x, y, 1) 

plot(x, y, 'yo', x, m*x+b, '--k') 
show() 
Run Code Online (Sandbox Code Playgroud)

DSM*_*DSM 162

arange 生成列表(好吧,numpy数组); 输入help(np.arange)详细信息.您无需在现有列表上调用它.

>>> x = [1,2,3,4]
>>> y = [3,5,7,9] 
>>> 
>>> m,b = np.polyfit(x, y, 1)
>>> m
2.0000000000000009
>>> b
0.99999999999999833
Run Code Online (Sandbox Code Playgroud)

我应该补充一点,我倾向于使用poly1d这里而不是写出"m*x + b"和更高阶的等价物,所以我的代码版本看起来像这样:

import numpy as np
import matplotlib.pyplot as plt

x = [1,2,3,4]
y = [3,5,7,10] # 10, not 9, so the fit isn't perfect

coef = np.polyfit(x,y,1)
poly1d_fn = np.poly1d(coef) 
# poly1d_fn is now a function which takes in x and returns an estimate for y

plt.plot(x,y, 'yo', x, poly1d_fn(x), '--k')
plt.xlim(0, 5)
plt.ylim(0, 12)
Run Code Online (Sandbox Code Playgroud)


Geo*_*lis 35

这段代码:

from scipy.stats import linregress

linregress(x,y) #x and y are arrays or lists.
Run Code Online (Sandbox Code Playgroud)

给出一个包含以下内容的列表:

斜率:浮法
回归直线的斜率
截距:浮法
回归直线的截距
浮动:r值
相关系数
p值:浮动
双面p值用于假设检验,其零假设是斜率为零
标准错误:浮
估计的标准误差

资源


tdy*_*tdy 18

用于statsmodels.api.OLS获取拟合/系数/残差的详细分类:

import statsmodels.api as sm

df = sm.datasets.get_rdataset('Duncan', 'carData').data
y = df['income']
x = df['education']

model = sm.OLS(y, sm.add_constant(x))
results = model.fit()

print(results.params)
# const        10.603498 <- intercept
# education     0.594859 <- slope
# dtype: float64

print(results.summary())
#                             OLS Regression Results                            
# ==============================================================================
# Dep. Variable:                 income   R-squared:                       0.525
# Model:                            OLS   Adj. R-squared:                  0.514
# Method:                 Least Squares   F-statistic:                     47.51
# Date:                Thu, 28 Apr 2022   Prob (F-statistic):           1.84e-08
# Time:                        00:02:43   Log-Likelihood:                -190.42
# No. Observations:                  45   AIC:                             384.8
# Df Residuals:                      43   BIC:                             388.5
# Df Model:                           1                                         
# Covariance Type:            nonrobust                                         
# ==============================================================================
#                  coef    std err          t      P>|t|      [0.025      0.975]
# ------------------------------------------------------------------------------
# const         10.6035      5.198      2.040      0.048       0.120      21.087
# education      0.5949      0.086      6.893      0.000       0.421       0.769
# ==============================================================================
# Omnibus:                        9.841   Durbin-Watson:                   1.736
# Prob(Omnibus):                  0.007   Jarque-Bera (JB):               10.609
# Skew:                           0.776   Prob(JB):                      0.00497
# Kurtosis:                       4.802   Cond. No.                         123.
# ==============================================================================
Run Code Online (Sandbox Code Playgroud)

matplotlib 3.5.0 中的新增功能

要绘制最佳拟合线,只需将斜率m和截距传递b到新的plt.axline

import matplotlib.pyplot as plt

# extract intercept b and slope m
b, m = results.params

# plot y = m*x + b
plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
Run Code Online (Sandbox Code Playgroud)

请注意,可以从任何常见的回归方法中轻松提取斜率m和截距:b


小智 8

import numpy as np
import matplotlib.pyplot as plt 
from scipy import stats

x = np.array([1.5,2,2.5,3,3.5,4,4.5,5,5.5,6])
y = np.array([10.35,12.3,13,14.0,16,17,18.2,20,20.7,22.5])
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
mn=np.min(x)
mx=np.max(x)
x1=np.linspace(mn,mx,500)
y1=gradient*x1+intercept
plt.plot(x,y,'ob')
plt.plot(x1,y1,'-r')
plt.show()
Run Code Online (Sandbox Code Playgroud)

用这个 ..


小智 6

George 的答案与 matplotlib绘制无限直线的axline非常吻合。

from scipy.stats import linregress
import matplotlib.pyplot as plt

reg = linregress(x, y)
plt.axline(xy1=(0, reg.intercept), slope=reg.slope, linestyle="--", color="k")
Run Code Online (Sandbox Code Playgroud)