基于斜率将线添加到matplotlib散点图

Den*_*nis 3 python matplotlib pandas

我有一个从DataFrame构建的散点图 - 它显示了两个变量的相关性 - 长度和年龄

import matplotlib.pyplot as plt
df = DataFrame (......)
plt.title ('Fish Length vs Age')
plt.xlabel('Length')
plt.ylabel('Age (days)')
plt.scatter(df['length'],df['age'])
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

现在我想在这个散点图中添加一条给定斜率为0.88的线.我该怎么做呢?

PS所有的例子我设法找到使用点而不是斜线来画线

更新.我重读了这个理论 - 事实证明相关系数应该根据数据点绘制的事实由我组成:)部分是因为我头脑中的这个图像在此输入图像描述

但是我仍然对matplotlib的线条绘图功能感到困惑

Kir*_*ane 6

在@ JinxunLi的答案基础上,您只想添加一条连接两点的线.

这两点有X和Y坐标,所以这两个点,你将有四个数字:x_0,y_0,x_1,y_1.

让我们假设你想这两个点的x坐标跨越x轴所以你要设置x_0x_1手动:

x_0 = 0
x_1 = 5000
Run Code Online (Sandbox Code Playgroud)

或者,您可以从轴获取最小值和最大值:

x_min, x_max = ax.get_xlim()
x_0 = x_min
x_1 = x_max
Run Code Online (Sandbox Code Playgroud)

您可以定义一条线的斜率increase in y / increase in x:

slope = (y_1 - y_0) / (x_1 - x_0)
Run Code Online (Sandbox Code Playgroud)

这可以重新排列为:

(y_1 - y_0) = slope * (x_1 - x_0)
Run Code Online (Sandbox Code Playgroud)

这个斜率有无数个平行线,因此我们必须设置其中一个点开始.对于这个例子,我们假设您希望线穿过原点(0,0)

x_0 = 0 # We already know this as it was set earlier
y_0 = 0
Run Code Online (Sandbox Code Playgroud)

现在您可以重新排列公式为y_1:

y_1 = slope * (x_1 - x_0) + y_0
Run Code Online (Sandbox Code Playgroud)

如果您知道您希望斜率为0.88,那么您可以计算另一个点的y位置:

y_1 = 0.88 * (5000 - 0) + 0
Run Code Online (Sandbox Code Playgroud)

对于您在问题中提供的数据,斜率为0.88的线将非常快速地从y轴的顶部飞出(y_1 = 4400在上面的示例中).

在下面的例子中,我输入了一条斜率= 0.03的线.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# simulate some artificial data
# =====================================
df = pd.DataFrame( { 'Age' : np.random.rand(25) * 160 } )

df['Length'] = df['Age'] * 0.88 + np.random.rand(25) * 5000

# plot those data points
# ==============================
fig, ax = plt.subplots()
ax.scatter(df['Length'], df['Age'])

# Now add on a line with a fixed slope of 0.03
slope = 0.03

# A line with a fixed slope can intercept the axis
# anywhere so we're going to have it go through 0,0
x_0 = 0
y_0 = 0

# And we'll have the line stop at x = 5000
x_1 = 5000
y_1 = slope (x_1 - x_0) + y_0

# Draw these two points with big triangles to make it clear
# where they lie
ax.scatter([x_0, x_1], [y_0, y_1], marker='^', s=150, c='r')

# And now connect them
ax.plot([x_0, x_1], [y_0, y_1], c='r')    

plt.show()
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述


Jia*_* Li 5

相关系数不会给出回归线的斜率,因为您的数据处于不同的比例.如果你想用回归线绘制散点图,我建议seaborn用最少的代码行来完成.

要安装seaborn,

pip install seaborn
Run Code Online (Sandbox Code Playgroud)

代码示例:

import numpy as np
import pandas as pd
import seaborn as sns

# simulate some artificial data
# =====================================
df = pd.DataFrame(np.random.multivariate_normal([10, 100], [[100, 800], [800, 10000]], size=100), columns=['X', 'Y'])

df

# plot 
# ====================================
sns.set_style('ticks')
sns.regplot(df.X, df.Y, ci=None)
sns.despine()  
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

编辑:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# simulate some artificial data
# =====================================
df = pd.DataFrame(np.random.multivariate_normal([10, 100], [[100, 800], [800, 10000]], size=100), columns=['X', 'Y'])


# plot
# ==============================
fig, ax = plt.subplots()
ax.scatter(df.X, df.Y)

# need a slope and c to fix the position of line
slope = 10
c = -100

x_min, x_max = ax.get_xlim()
y_min, y_max = c, c + slope*(x_max-x_min)
ax.plot([x_min, x_max], [y_min, y_max])
ax.set_xlim([x_min, x_max])
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述