如何使matplotlib/pandas条形图看起来像直方图?

tmt*_*prt 5 python plot numpy matplotlib pandas

绘制bar和之间的差异hist

由于在一些数据pandas.Series,rv,之间是有差异

  1. hist直接调用数据绘图

  2. 计算直方图结果(with numpy.histogram)然后用bar

示例数据生成

%matplotlib inline

import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
matplotlib.style.use('ggplot')

# Setup size and distribution
size = 50000
distribution = stats.norm()

# Create random data
rv = pd.Series(distribution.rvs(size=size))
# Get sane start and end points of distribution
start = distribution.ppf(0.01)
end = distribution.ppf(0.99)

# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = distribution.pdf(x)
pdf = pd.Series(y, x)

# Get histogram of random data
y, x = np.histogram(rv, bins=50, normed=True)
# Correct bin edge placement
x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
hist = pd.Series(y, x)
Run Code Online (Sandbox Code Playgroud)

hist() 绘制

ax = pdf.plot(lw=2, label='PDF', legend=True)
rv.plot(kind='hist', bins=50, normed=True, alpha=0.5, label='Random Samples', legend=True, ax=ax)
Run Code Online (Sandbox Code Playgroud)

组织绘图

bar() 绘制

ax = pdf.plot(lw=2, label='PDF', legend=True)
hist.plot(kind='bar', alpha=0.5, label='Random Samples', legend=True, ax=ax)
Run Code Online (Sandbox Code Playgroud)

酒吧密谋

如何使bar情节看起来像hist情节?

用例需要仅保存要使用的直方图数据并稍后绘图(其尺寸通常小于原始数据).

tmt*_*prt 8

酒吧密谋差异

获得bar看起来像hist情节的图需要一些操纵默认行为bar.

  1. bar通过传递x(hist.index)和y(hist.values)强制使用实际x数据绘制范围.默认bar行为是将y数据绘制为任意范围,并将x数据作为标签.
  2. 设置width参数与x数据的实际步长相关(默认为0.8)
  3. align参数设置为'center'.
  4. 手动设置轴图例.

需要这些变化进行通过matplotlibbar()呼吁轴线(ax)代替pandasbar()呼吁数据(hist).

示例绘图

%matplotlib inline

import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
matplotlib.style.use('ggplot')

# Setup size and distribution
size = 50000
distribution = stats.norm()

# Create random data
rv = pd.Series(distribution.rvs(size=size))
# Get sane start and end points of distribution
start = distribution.ppf(0.01)
end = distribution.ppf(0.99)

# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = distribution.pdf(x)
pdf = pd.Series(y, x)

# Get histogram of random data
y, x = np.histogram(rv, bins=50, normed=True)
# Correct bin edge placement
x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
hist = pd.Series(y, x)

# Plot previously histogrammed data
ax = pdf.plot(lw=2, label='PDF', legend=True)
w = abs(hist.index[1]) - abs(hist.index[0])
ax.bar(hist.index, hist.values, width=w, alpha=0.5, align='center')
ax.legend(['PDF', 'Random Samples'])
Run Code Online (Sandbox Code Playgroud)

直方图