matplotlib直方图矩阵,使用Pandas,多个类别叠加

alk*_*key 1 python matplotlib histogram pandas

我正在尝试结合两种方法来创建直方图。

#Sample Data

df = pd.DataFrame({'V1':[1,2,3,4,5,6], 
         'V2': [43,35,6,7,31,34], 
         'V3': [23,75,67,23,56,32],
         'V4': [23,45,67,63,56,32],
        'V5': [23,5,67,23,6,2],
        'V6': [23,78,67,76,56,2],
        'V7': [23,45,67,53,56,32],
        'V8': [5,5,5,5,5,5],
        'cat': ["A","B","C","A","B","B"],})
Run Code Online (Sandbox Code Playgroud)

我可以使用此代码为每个类别创建一个直方图矩阵。

 #1. Creating histogram matrix for each category 

for i in df['cat'].unique():
    fig, ax = plt.subplots()
    df[df['cat']==i].hist(figsize=(20,20),ax =ax)
    fig.suptitle(i + " Feature-Class Relationships", fontsize = 20)
    fig.savefig('Histogram Matrix.png' %(i), dpi = 240)
Run Code Online (Sandbox Code Playgroud)

这将为每个类别创建一个单独的直方图矩阵。然而,我希望将类别叠加在同一矩阵上。

我可以使用这种方法创建重叠直方图:

#2. Overlaid histrogram for single variable

fig, ax = plt.subplots()
for i in df['cat'].unique():
    df[df['cat']==i]['V8'].hist(figsize=(12,8),ax =ax, alpha = 0.5, label = i)
ax.legend()
plt.show()
Run Code Online (Sandbox Code Playgroud)

然而,这只会创建一个重叠图像。我想为矩阵中的所有变量创建一个重叠直方图,即同一矩阵中显示的所有类别,而不是为每个类别创建一个单独的矩阵。我创建了以下代码,它是上述两种方法的组合,但它不会将每个直方图矩阵叠加在一起,并且仅创建最后一个图。

#3. Combining approaches to create a matrix of overlaid histograms

fig, ax = plt.subplots()
for i in df['cat'].unique():
    df[df['cat']==i].hist(figsize=(12,8),ax =ax, alpha = 0.5, label = i)
ax.legend()
fig.savefig('Combined.png', dpi = 240)
Run Code Online (Sandbox Code Playgroud)

我想做的事情可能吗?

小智 5

我想这就是你想要的。一个 2 列 4 行的矩阵,在该矩阵的每个“单元格”中,您都会获得类别重叠的列的直方图。

import pandas as pd
from matplotlib import pyplot as plt

df = pd.DataFrame({'V1':[1,2,3,4,5,6], 
         'V2': [43,35,6,7,31,34], 
         'V3': [23,75,67,23,56,32],
         'V4': [23,45,67,63,56,32],
        'V5': [23,5,67,23,6,2],
        'V6': [23,78,67,76,56,2],
        'V7': [23,45,67,53,56,32],
        'V8': [5,5,5,5,5,5],
        'cat': ["A","B","C","A","B","B"],})

# Define your subplots matrix.
# In this example the fig has 4 rows and 2 columns
fig, axes = plt.subplots(4, 2, figsize=(12, 8))

# This approach is better than looping through df.cat.unique
for g, d in df.groupby('cat'):
    d.hist(alpha = 0.5, ax=axes, label=g)

# Just outputing the legend for each column in fig
for c1, c2 in axes:
    c1.legend()
    c2.legend()

plt.show()
Run Code Online (Sandbox Code Playgroud)

这是输出:

次要情节