是否可以在给定百分位数值而不是原始输入的情况下绘制matplotlib箱图?

Ale*_*uch 11 python matplotlib percentile boxplot python-2.7

从我所看到的,boxplot()方法需要一系列原始值(数字)作为输入,然后从中计算百分位数以绘制箱线图.

我想有一种方法,通过它我可以传递百分位数并得到相应的boxplot.

例如:

假设我已经运行了几个基准测试,并且对于每个基准测试我都测量了延迟(浮点值).另外,我已经预先计算了这些值的百分位数.

因此,对于每个基准测试,我有第25,第50,第75百分位数以及最小值和最大值.

现在给出这些数据,我想绘制基准的方框图.

Vic*_*gio 30

截至 2020 年,有一种比公认答案中的方法更好的方法。

matplotlib.axes.Axes类提供了一个bxp方法,该方法可以用来绘制基于百分位值的框和晶须。只有异常值才需要原始数据,这是可选的。

例子:

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
boxes = [
    {
        'label' : "Male height",
        'whislo': 162.6,    # Bottom whisker position
        'q1'    : 170.2,    # First quartile (25th percentile)
        'med'   : 175.7,    # Median         (50th percentile)
        'q3'    : 180.4,    # Third quartile (75th percentile)
        'whishi': 187.8,    # Top whisker position
        'fliers': []        # Outliers
    }
]
ax.bxp(boxes, showfliers=False)
ax.set_ylabel("cm")
plt.savefig("boxplot.png")
plt.close()
Run Code Online (Sandbox Code Playgroud)

这会产生以下图像: 示例箱线图


Rag*_* RV 18

为了使用百分位数值和异常值(如果有的话)绘制框图,我创建了一个customized_box_plot函数,它基本上修改了基本框图(从微小的样本数据生成)中的属性,使其适合您的百分位数值.

customized_box_plot功能

def customized_box_plot(percentiles, axes, redraw = True, *args, **kwargs):
    """
    Generates a customized boxplot based on the given percentile values
    """

    box_plot = axes.boxplot([[-9, -4, 2, 4, 9],]*n_box, *args, **kwargs) 
    # Creates len(percentiles) no of box plots

    min_y, max_y = float('inf'), -float('inf')

    for box_no, (q1_start, 
                 q2_start,
                 q3_start,
                 q4_start,
                 q4_end,
                 fliers_xy) in enumerate(percentiles):

        # Lower cap
        box_plot['caps'][2*box_no].set_ydata([q1_start, q1_start])
        # xdata is determined by the width of the box plot

        # Lower whiskers
        box_plot['whiskers'][2*box_no].set_ydata([q1_start, q2_start])

        # Higher cap
        box_plot['caps'][2*box_no + 1].set_ydata([q4_end, q4_end])

        # Higher whiskers
        box_plot['whiskers'][2*box_no + 1].set_ydata([q4_start, q4_end])

        # Box
        box_plot['boxes'][box_no].set_ydata([q2_start, 
                                             q2_start, 
                                             q4_start,
                                             q4_start,
                                             q2_start])

        # Median
        box_plot['medians'][box_no].set_ydata([q3_start, q3_start])

        # Outliers
        if fliers_xy is not None and len(fliers_xy[0]) != 0:
            # If outliers exist
            box_plot['fliers'][box_no].set(xdata = fliers_xy[0],
                                           ydata = fliers_xy[1])

            min_y = min(q1_start, min_y, fliers_xy[1].min())
            max_y = max(q4_end, max_y, fliers_xy[1].max())

        else:
            min_y = min(q1_start, min_y)
            max_y = max(q4_end, max_y)

        # The y axis is rescaled to fit the new box plot completely with 10% 
        # of the maximum value at both ends
        axes.set_ylim([min_y*1.1, max_y*1.1])

    # If redraw is set to true, the canvas is updated.
    if redraw:
        ax.figure.canvas.draw()

    return box_plot
Run Code Online (Sandbox Code Playgroud)

用法

使用逆逻辑(最后的代码)我从这个例子中提取了百分位数值

>>> percentiles
(-1.0597368367634488, 0.3977683984966961, 1.0298955252405229, 1.6693981537742526, 3.4951447843464449)
(-0.90494930553559483, 0.36916539612108634, 1.0303658700697103, 1.6874542731392828, 3.4951447843464449)
(0.13744105279440233, 1.3300645202649739, 2.6131540656339483, 4.8763411136047647, 9.5751914834437937)
(0.22786243898199182, 1.4120860286080519, 2.637650402506837, 4.9067126578493259, 9.4660357513550899)
(0.0064696168078617741, 0.30586770128093388, 0.70774153557312702, 1.5241965711101928, 3.3092932063051976)
(0.007009744579241136, 0.28627373934008982, 0.66039691869500572, 1.4772725266672091, 3.221716765477217)
(-2.2621660374110544, 5.1901313713883352, 7.7178532139979357, 11.277744848353247, 20.155971739152388)
(-2.2621660374110544, 5.1884411864079532, 7.3357079047721054, 10.792299385806913, 18.842012119715388)
(2.5417888074435702, 5.885996170695587, 7.7271286220368598, 8.9207423361593179, 10.846938621419374)
(2.5971767318505856, 5.753551925927133, 7.6569980004033464, 8.8161056254143233, 10.846938621419374)
Run Code Online (Sandbox Code Playgroud)

请注意,为了保持这个简短,我没有显示异常值向量,它将是每个百分位数组的第6个元素.

另请注意,可以使用所有常用的额外kwargs/args,因为它们只是传递给boxplot它内部的方法:

>>> fig, ax = plt.subplots()
>>> b = customized_box_plot(percentiles, ax, redraw=True, notch=0, sym='+', vert=1, whis=1.5)
>>> plt.show()
Run Code Online (Sandbox Code Playgroud)

Box Plot使用百分位数值

说明

boxplot方法返回一个字典,将boxplot的组件映射到matplotlib.lines.Line2D创建的各个实例.

引用matplotlib.pyplot.boxplot文档:

该词典具有以下键(假设垂直箱图):

框:框图的主体显示四分位数和中位数的置信区间(如果启用).

中位数:每个框中位数的水平线.

胡须:垂直线延伸到最极端的n-异常值数据点.caps:胡须末端的水平线.

传单:表示超出胡须(异常值)的数据的点.

表示:表示平均值的点或线.

例如,观察boxplot一个微小的样本数据[-9, -4, 2, 4, 9]

>>> b = ax.boxplot([[-9, -4, 2, 4, 9],])
>>> b
{'boxes': [<matplotlib.lines.Line2D at 0x7fe1f5b21350>],
'caps': [<matplotlib.lines.Line2D at 0x7fe1f54d4e50>,
<matplotlib.lines.Line2D at 0x7fe1f54d0e50>],
'fliers': [<matplotlib.lines.Line2D at 0x7fe1f5b317d0>],
'means': [],
'medians': [<matplotlib.lines.Line2D at 0x7fe1f63549d0>],
'whiskers': [<matplotlib.lines.Line2D at 0x7fe1f5b22e10>,
             <matplotlib.lines.Line2D at 0x7fe20c54a510>]} 

>>> plt.show()
Run Code Online (Sandbox Code Playgroud)

样本框图

这些matplotlib.lines.Line2D对象有两种我将在我的函数中广泛使用的方法.set_xdata(或set_ydata)和get_xdata(或get_ydata).

使用这些方法,我们可以改变基本框图的构成线的位置,以符合您的百分位数值(这是customized_box_plot函数的作用).在更改构成线的位置后,您可以使用重绘画布 figure.canvas.draw()

总结从百分位数到各种Line2D对象坐标的映射.

Y坐标:

  • 最大值(q4_end- 第四个四分位数的末尾)对应于最顶部的Line2D对象.
  • min(q1_start- 第一个四分位数的开始)对应于最下面的最顶层Line2D对象.
  • 中位数对应于(q3_start)中位数Line2D对象.
  • 两个胡须位于盒子末端和极端帽之间(q1_startq2_start- 较低的须状物; q4_startq4_end- 上部须状物)
  • 盒子实际上是一个有趣的n形状线,下部有一个帽子.n形状线的极端对应于q2_startq4_start.

X坐标:

  • 中心x坐标(对于多个箱形图通常为1,2,3 ......)
  • 库根据指定的宽度自动计算边界x坐标.

从箱形图DICT中检索PERCENTILES的反函数:

def get_percentiles_from_box_plots(bp):
    percentiles = []
    for i in range(len(bp['boxes'])):
        percentiles.append((bp['caps'][2*i].get_ydata()[0],
                           bp['boxes'][i].get_ydata()[0],
                           bp['medians'][i].get_ydata()[0],
                           bp['boxes'][i].get_ydata()[2],
                           bp['caps'][2*i + 1].get_ydata()[0],
                           (bp['fliers'][i].get_xdata(),
                            bp['fliers'][i].get_ydata())))
    return percentiles
Run Code Online (Sandbox Code Playgroud)

注意:我没有制作完全自定义的boxplot方法的原因是因为,内置的盒子图中提供的许多功能无法完全复制.

如果我可能不必要地解释一些可能太明显的东西,也请原谅.


mas*_*chu 5

这是此有用例程的更新版本。直接设置顶点对于填充框(patchArtist = True)和未填充框都适用。

def customized_box_plot(percentiles, axes, redraw = True, *args, **kwargs):
    """
    Generates a customized boxplot based on the given percentile values
    """
    n_box = len(percentiles)
    box_plot = axes.boxplot([[-9, -4, 2, 4, 9],]*n_box, *args, **kwargs) 
    # Creates len(percentiles) no of box plots

    min_y, max_y = float('inf'), -float('inf')

    for box_no, pdata in enumerate(percentiles):
        if len(pdata) == 6:
            (q1_start, q2_start, q3_start, q4_start, q4_end, fliers_xy) = pdata
        elif len(pdata) == 5:
            (q1_start, q2_start, q3_start, q4_start, q4_end) = pdata
            fliers_xy = None
        else:
            raise ValueError("Percentile arrays for customized_box_plot must have either 5 or 6 values")

        # Lower cap
        box_plot['caps'][2*box_no].set_ydata([q1_start, q1_start])
        # xdata is determined by the width of the box plot

        # Lower whiskers
        box_plot['whiskers'][2*box_no].set_ydata([q1_start, q2_start])

        # Higher cap
        box_plot['caps'][2*box_no + 1].set_ydata([q4_end, q4_end])

        # Higher whiskers
        box_plot['whiskers'][2*box_no + 1].set_ydata([q4_start, q4_end])

        # Box
        path = box_plot['boxes'][box_no].get_path()
        path.vertices[0][1] = q2_start
        path.vertices[1][1] = q2_start
        path.vertices[2][1] = q4_start
        path.vertices[3][1] = q4_start
        path.vertices[4][1] = q2_start

        # Median
        box_plot['medians'][box_no].set_ydata([q3_start, q3_start])

        # Outliers
        if fliers_xy is not None and len(fliers_xy[0]) != 0:
            # If outliers exist
            box_plot['fliers'][box_no].set(xdata = fliers_xy[0],
                                           ydata = fliers_xy[1])

            min_y = min(q1_start, min_y, fliers_xy[1].min())
            max_y = max(q4_end, max_y, fliers_xy[1].max())

        else:
            min_y = min(q1_start, min_y)
            max_y = max(q4_end, max_y)

        # The y axis is rescaled to fit the new box plot completely with 10% 
        # of the maximum value at both ends
        axes.set_ylim([min_y*1.1, max_y*1.1])

    # If redraw is set to true, the canvas is updated.
    if redraw:
        ax.figure.canvas.draw()

    return box_plot
Run Code Online (Sandbox Code Playgroud)