如何按递增顺序(按中值)对箱线图值进行排序?

Sca*_*Boy 1 python matplotlib pandas seaborn

这是我的熊猫数据帧:

Area            Gender  Quantity
XXX             Men     115
XXX             Men     105    
XXX             Men     114
YYY             Men     100
YYY             Men     90    
YYY             Men     95
YYY             Men     101
XXX             Women   120    
XXX             Women   122
XXX             Women   115
XXX             Women   117    
YYY             Women   91
YYY             Women   90
YYY             Women   90
Run Code Online (Sandbox Code Playgroud)

这就是我创建箱线图的方式。

import seaboard as sns
import matplotlib.pyplot as pat

fig, ax = plt.subplots(figsize=(15,11))
ax = sns.boxplot(x="Area", y="Quantity", hue="Gender", data=df, palette="Set3")
Run Code Online (Sandbox Code Playgroud)

我想AreaQuantity递增顺序按中位数对组进行排序。我该怎么做?

nor*_*ius 5

对于当前版本的 seaborn (<=0.9.0),这是不可能的。目前你能做的最好的事情是设置hue_order(例如:)['Woman', 'Men'],但它同样适用于所有组,这不是你想要的。

此外,扩展boxplot()并没有那么简单,因为 seaborn 没有在官方 API 中公开负责绘图的类。请参阅此处的入口点boxplot()(截至 2018 年 10 月 20 日的 seaborn 主版本的永久链接,git hash:84ca6c6)。

如果您不害怕使用内部的 seaborn 对象,您可以创建自己的sorted_boxplot(). 实现顺序的最可能最简单的方法就是修改以下行_BoxPlotter.draw_boxplot()(永久链接,混帐:84ca6c6):

# Original
center = i + offsets[j]

# Fix:
ordered_offsets = ...
center = i + ordered_offsets[j]
Run Code Online (Sandbox Code Playgroud)

center指箱线图的位置,i是组j的索引,是当前 的索引hue。我通过派生_BoxPlotter和覆盖来测试了这一点draw_boxplot(),请参阅下面的一些代码。

PS:如果有人对此进行详细说明以建议对 seaborn 的拉取请求,那就太好了。该功能当然有用。


以下对我有用(python 3.6,seaborn 0.9.0):

import numpy as np
import seaborn as sns
from seaborn.categorical import _BoxPlotter
from seaborn.utils import remove_na

class SortedBoxPlotter(_BoxPlotter):
    def __init__(self, *args, **kwargs):
        super(SortedBoxPlotter, self).__init__(*args, **kwargs)

    def draw_boxplot(self, ax, kws):
        '''
        Below code has been copied partly from seaborn.categorical.py
        and is reproduced only for educational purposes.
        '''
        if self.plot_hues is None:
            # Sorting by hue doesn't apply here. Just
            return super(SortedBoxPlotter, self).draw_boxplot(ax, kws)

        vert = self.orient == "v"
        props = {}
        for obj in ["box", "whisker", "cap", "median", "flier"]:
            props[obj] = kws.pop(obj + "props", {})

        for i, group_data in enumerate(self.plot_data):

            # ==> Sort offsets by median
            offsets = self.hue_offsets
            medians = [ np.median(group_data[self.plot_hues[i] == h])
                        for h in self.hue_names ]
            offsets_sorted = offsets[np.argsort(medians)]

            # Draw nested groups of boxes
            for j, hue_level in enumerate(self.hue_names):

                # Add a legend for this hue level
                if not i:
                    self.add_legend_data(ax, self.colors[j], hue_level)

                # Handle case where there is data at this level
                if group_data.size == 0:
                    continue

                hue_mask = self.plot_hues[i] == hue_level
                box_data = remove_na(group_data[hue_mask])

                # Handle case where there is no non-null data
                if box_data.size == 0:
                    continue

                # ==> Fix ordering
                center = i + offsets_sorted[j]

                artist_dict = ax.boxplot(box_data,
                                         vert=vert,
                                         patch_artist=True,
                                         positions=[center],
                                         widths=self.nested_width,
                                         **kws)
                self.restyle_boxplot(artist_dict, self.colors[j], props)

def sorted_boxplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
                   orient=None, color=None, palette=None, saturation=.75,
                   width=.8, dodge=True, fliersize=5, linewidth=None,
                   whis=1.5, notch=False, ax=None, **kwargs):

    '''
    Same as sns.boxplot(), except that nested groups of boxes are plotted by
    increasing median.
    '''

    plotter = SortedBoxPlotter(x, y, hue, data, order, hue_order,
                               orient, color, palette, saturation,
                               width, dodge, fliersize, linewidth)
    if ax is None:
        ax = plt.gca()
    kwargs.update(dict(whis=whis, notch=notch))
    plotter.plot(ax, kwargs)
    return ax
Run Code Online (Sandbox Code Playgroud)

要使用您的示例数据运行:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame([ ["XXX", "Men" ,  115],
                    ["XXX", "Men" ,  105    ],
                    ["XXX", "Men" ,  114],
                    ["YYY", "Men" ,  100],
                    ["YYY", "Men" ,  90    ],
                    ["YYY", "Men" ,  95],
                    ["YYY", "Men" ,  101],
                    ["XXX", "Women", 120    ],
                    ["XXX", "Women", 122],
                    ["XXX", "Women", 115],
                    ["XXX", "Women", 117    ],
                    ["YYY", "Women", 91],
                    ["YYY", "Women", 90],
                    ["YYY", "Women", 90]],
                  columns = ["Area", "Gender", "Quantity"])
sorted_boxplot(x="Area", y="Quantity", hue="Gender", data=df, palette="Set3")
plt.show()
Run Code Online (Sandbox Code Playgroud)

结果:

在此处输入图片说明