Matplotlib 中具有边际 KDE 图和多个类别的散点图

Question

Matplotlib 中具有边际 KDE 图和多个类别的散点图

Dav*_*ave 5 margin matplotlib histogram scatter-plot kde-plasma

我想要 Matplotlib 中的一个函数类似于Matlab 的“scatterhist”函数，它采用“x”和“y”轴的连续值，加上一个分类变量作为输入；并生成一个散点图，其中包含边际 KDE 图和两个或多个不同颜色的分类变量作为输出：我在 Matplotlib 中找到了带有边缘直方图的散点图示例，在 Seaborn jointplot 中找到了边缘直方图，在 Matplotlib 中找到了重叠直方图，在 Matplotib 中找到了边缘 KDE 图；但我还没有找到任何将散点图与边际 KDE 图结合起来并用颜色编码来指示不同类别的示例。

如果可能的话，我想要一个使用“vanilla”Matplotlib而不使用Seaborn的解决方案，因为这将避免依赖性并允许使用标准Matplotlib命令完全控制和自定义绘图外观。

我打算尝试根据上面的例子写一些东西；但在此之前，我想检查一下类似的功能是否已经可用，如果没有，那么将不胜感激有关最佳使用方法的任何指导。

Answer 1

Dav*_*ave 1

@ImportanceOfBeingEarnest：非常感谢您的帮助。这是我第一次尝试解决方案。它有点 hacky，但实现了我的目标，并且可以使用标准 matplotlib 命令完全自定义。我在这里发布带有注释的代码，以防其他人希望使用它或进一步开发它。如果有任何改进或更简洁的编写代码的方法，我总是热衷于学习，并将不胜感激。

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
from scipy import stats

label = ['Setosa','Versicolor','Virginica'] # List of labels for categories
cl = ['b','r','y'] # List of colours for categories
categories = len(label)
sample_size = 20 # Number of samples in each category

# Create numpy arrays for dummy x and y data:
x = np.zeros(shape=(categories, sample_size))
y = np.zeros(shape=(categories, sample_size))

# Generate random data for each categorical variable:
for n in range (0, categories):
    x[n,:] = np.array(np.random.randn(sample_size)) + 4 + n
    y[n,:] = np.array(np.random.randn(sample_size)) + 6 - n

# Set up 4 subplots as axis objects using GridSpec:
gs = gridspec.GridSpec(2, 2, width_ratios=[1,3], height_ratios=[3,1])
# Add space between scatter plot and KDE plots to accommodate axis labels:
gs.update(hspace=0.3, wspace=0.3)

# Set background canvas colour to White instead of grey default
fig = plt.figure()
fig.patch.set_facecolor('white')

ax = plt.subplot(gs[0,1]) # Instantiate scatter plot area and axis range
ax.set_xlim(x.min(), x.max())
ax.set_ylim(y.min(), y.max())
ax.set_xlabel('x')
ax.set_ylabel('y')

axl = plt.subplot(gs[0,0], sharey=ax) # Instantiate left KDE plot area
axl.get_xaxis().set_visible(False) # Hide tick marks and spines
axl.get_yaxis().set_visible(False)
axl.spines["right"].set_visible(False)
axl.spines["top"].set_visible(False)
axl.spines["bottom"].set_visible(False)

axb = plt.subplot(gs[1,1], sharex=ax) # Instantiate bottom KDE plot area
axb.get_xaxis().set_visible(False) # Hide tick marks and spines
axb.get_yaxis().set_visible(False)
axb.spines["right"].set_visible(False)
axb.spines["top"].set_visible(False)
axb.spines["left"].set_visible(False)

axc = plt.subplot(gs[1,0]) # Instantiate legend plot area
axc.axis('off') # Hide tick marks and spines

# Plot data for each categorical variable as scatter and marginal KDE plots:
for n in range (0, categories):
    ax.scatter(x[n],y[n], color='none', label=label[n], s=100, edgecolor= cl[n])

    kde = stats.gaussian_kde(x[n,:])
    xx = np.linspace(x.min(), x.max(), 1000)
    axb.plot(xx, kde(xx), color=cl[n])

    kde = stats.gaussian_kde(y[n,:])
    yy = np.linspace(y.min(), y.max(), 1000)
    axl.plot(kde(yy), yy, color=cl[n])

# Copy legend object from scatter plot to lower left subplot and display:
# NB 'scatterpoints = 1' customises legend box to show only 1 handle (icon) per label 
handles, labels = ax.get_legend_handles_labels()
axc.legend(handles, labels, scatterpoints = 1, loc = 'center', fontsize = 12)

plt.show()`

Run Code Online (Sandbox Code Playgroud)

`

归档时间：	6 年，7 月前
查看次数：	3820 次
最近记录：	6 年，7 月前