Sca*_*Boy 1 python matplotlib pandas seaborn
这是我的熊猫数据帧:
Area Gender Quantity
XXX Men 115
XXX Men 105
XXX Men 114
YYY Men 100
YYY Men 90
YYY Men 95
YYY Men 101
XXX Women 120
XXX Women 122
XXX Women 115
XXX Women 117
YYY Women 91
YYY Women 90
YYY Women 90
Run Code Online (Sandbox Code Playgroud)
这就是我创建箱线图的方式。
import seaboard as sns
import matplotlib.pyplot as pat
fig, ax = plt.subplots(figsize=(15,11))
ax = sns.boxplot(x="Area", y="Quantity", hue="Gender", data=df, palette="Set3")
Run Code Online (Sandbox Code Playgroud)
我想Area按Quantity递增顺序按中位数对组进行排序。我该怎么做?
对于当前版本的 seaborn (<=0.9.0),这是不可能的。目前你能做的最好的事情是设置hue_order(例如:)['Woman', 'Men'],但它同样适用于所有组,这不是你想要的。
此外,扩展boxplot()并没有那么简单,因为 seaborn 没有在官方 API 中公开负责绘图的类。请参阅此处的入口点boxplot()(截至 2018 年 10 月 20 日的 seaborn 主版本的永久链接,git hash:84ca6c6)。
如果您不害怕使用内部的 seaborn 对象,您可以创建自己的sorted_boxplot(). 实现顺序的最可能最简单的方法就是修改以下行中_BoxPlotter.draw_boxplot()(永久链接,混帐:84ca6c6):
# Original
center = i + offsets[j]
# Fix:
ordered_offsets = ...
center = i + ordered_offsets[j]
Run Code Online (Sandbox Code Playgroud)
center指箱线图的位置,i是组j的索引,是当前 的索引hue。我通过派生_BoxPlotter和覆盖来测试了这一点draw_boxplot(),请参阅下面的一些代码。
PS:如果有人对此进行详细说明以建议对 seaborn 的拉取请求,那就太好了。该功能当然有用。
以下对我有用(python 3.6,seaborn 0.9.0):
import numpy as np
import seaborn as sns
from seaborn.categorical import _BoxPlotter
from seaborn.utils import remove_na
class SortedBoxPlotter(_BoxPlotter):
def __init__(self, *args, **kwargs):
super(SortedBoxPlotter, self).__init__(*args, **kwargs)
def draw_boxplot(self, ax, kws):
'''
Below code has been copied partly from seaborn.categorical.py
and is reproduced only for educational purposes.
'''
if self.plot_hues is None:
# Sorting by hue doesn't apply here. Just
return super(SortedBoxPlotter, self).draw_boxplot(ax, kws)
vert = self.orient == "v"
props = {}
for obj in ["box", "whisker", "cap", "median", "flier"]:
props[obj] = kws.pop(obj + "props", {})
for i, group_data in enumerate(self.plot_data):
# ==> Sort offsets by median
offsets = self.hue_offsets
medians = [ np.median(group_data[self.plot_hues[i] == h])
for h in self.hue_names ]
offsets_sorted = offsets[np.argsort(medians)]
# Draw nested groups of boxes
for j, hue_level in enumerate(self.hue_names):
# Add a legend for this hue level
if not i:
self.add_legend_data(ax, self.colors[j], hue_level)
# Handle case where there is data at this level
if group_data.size == 0:
continue
hue_mask = self.plot_hues[i] == hue_level
box_data = remove_na(group_data[hue_mask])
# Handle case where there is no non-null data
if box_data.size == 0:
continue
# ==> Fix ordering
center = i + offsets_sorted[j]
artist_dict = ax.boxplot(box_data,
vert=vert,
patch_artist=True,
positions=[center],
widths=self.nested_width,
**kws)
self.restyle_boxplot(artist_dict, self.colors[j], props)
def sorted_boxplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
orient=None, color=None, palette=None, saturation=.75,
width=.8, dodge=True, fliersize=5, linewidth=None,
whis=1.5, notch=False, ax=None, **kwargs):
'''
Same as sns.boxplot(), except that nested groups of boxes are plotted by
increasing median.
'''
plotter = SortedBoxPlotter(x, y, hue, data, order, hue_order,
orient, color, palette, saturation,
width, dodge, fliersize, linewidth)
if ax is None:
ax = plt.gca()
kwargs.update(dict(whis=whis, notch=notch))
plotter.plot(ax, kwargs)
return ax
Run Code Online (Sandbox Code Playgroud)
要使用您的示例数据运行:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame([ ["XXX", "Men" , 115],
["XXX", "Men" , 105 ],
["XXX", "Men" , 114],
["YYY", "Men" , 100],
["YYY", "Men" , 90 ],
["YYY", "Men" , 95],
["YYY", "Men" , 101],
["XXX", "Women", 120 ],
["XXX", "Women", 122],
["XXX", "Women", 115],
["XXX", "Women", 117 ],
["YYY", "Women", 91],
["YYY", "Women", 90],
["YYY", "Women", 90]],
columns = ["Area", "Gender", "Quantity"])
sorted_boxplot(x="Area", y="Quantity", hue="Gender", data=df, palette="Set3")
plt.show()
Run Code Online (Sandbox Code Playgroud)
结果:
| 归档时间: |
|
| 查看次数: |
1316 次 |
| 最近记录: |