如何添加最适合散点图的线条

Jav*_*ser 6 python plot numpy matplotlib pandas

我目前正在使用Pandas和matplotlib来执行一些数据可视化,我想在散点图中添加一条最适合的线.

这是我的代码:

import matplotlib
import matplotlib.pyplot as plt
import pandas as panda
import numpy as np

def PCA_scatter(filename):

   matplotlib.style.use('ggplot')

   data = panda.read_csv(filename)
   data_reduced = data[['2005', '2015']]

   data_reduced.plot(kind='scatter', x='2005', y='2015')
   plt.show()

PCA_scatter('file.csv')
Run Code Online (Sandbox Code Playgroud)

我该怎么做?

Rob*_*oun 9

你可以用Seaborn一举完成整体和情节.

import pandas as pd
import seaborn as sns
data_reduced= pd.read_csv('fake.txt',sep='\s+')
sns.regplot(data_reduced['2005'],data_reduced['2015'])
Run Code Online (Sandbox Code Playgroud)

regressionplot

  • 但我想使用 matplotlib!:( (6认同)

Ste*_*fan 6

你可以使用np.polyfit()np.poly1d().使用相同的x值估计一次多项式,并添加到绘图ax创建的对象.scatter().举个例子:

import numpy as np

     2005   2015
0   18882  21979
1    1161   1044
2     482    558
3    2105   2471
4     427   1467
5    2688   2964
6    1806   1865
7     711    738
8     928   1096
9    1084   1309
10    854    901
11    827   1210
12   5034   6253
Run Code Online (Sandbox Code Playgroud)

估计一次多项式:

z = np.polyfit(x=df.loc[:, 2005], y=df.loc[:, 2015], deg=1)
p = np.poly1d(z)
df['trendline'] = p(df.loc[:, 2005])

     2005   2015     trendline
0   18882  21979  21989.829486
1    1161   1044   1418.214712
2     482    558    629.990208
3    2105   2471   2514.067336
4     427   1467    566.142863
5    2688   2964   3190.849200
6    1806   1865   2166.969948
7     711    738    895.827339
8     928   1096   1147.734139
9    1084   1309   1328.828428
10    854    901   1061.830437
11    827   1210   1030.487195
12   5034   6253   5914.228708
Run Code Online (Sandbox Code Playgroud)

和情节:

ax = df.plot.scatter(x=2005, y=2015)
df.set_index(2005, inplace=True)
df.trendline.sort_index(ascending=False).plot(ax=ax)
plt.gca().invert_xaxis()
Run Code Online (Sandbox Code Playgroud)

要得到:

在此输入图像描述

还提供了线方程:

'y={0:.2f} x + {1:.2f}'.format(z[0],z[1])

y=1.16 x + 70.46
Run Code Online (Sandbox Code Playgroud)


Ale*_*ams 5

另一种选择(使用np.linalg.lstsq):

# generate some fake data
N = 50
x = np.random.randn(N, 1)
y = x*2.2 + np.random.randn(N, 1)*0.4 - 1.8
plt.axhline(0, color='r', zorder=-1)
plt.axvline(0, color='r', zorder=-1)
plt.scatter(x, y)

# fit least-squares with an intercept
w = np.linalg.lstsq(np.hstack((x, np.ones((N,1)))), y)[0]
xx = np.linspace(*plt.gca().get_xlim()).T

# plot best-fit line
plt.plot(xx, w[0]*xx + w[1], '-k')
Run Code Online (Sandbox Code Playgroud)

最佳拟合线