如何从频率数据中找到分位数？

Question

如何从频率数据中找到分位数？

2ya*_*yan 3 python statistics quantile pandas

假设我有一个数据表，其中客户购买了这样的东西：

Customer|Price|Quantity Sold  
a       | 200 |   3.3  
b       | 120 |   4.1  
c       | 040 |   12.0  
d       | 030 |   16.76

Run Code Online (Sandbox Code Playgroud)

这应该是数据表的粗略表示，其中包含同一产品的客户、价格和销售数量。

我想弄清楚如何计算此信息的购买价格中位数。

我对方法有点困惑，因为我知道在 pandas 中获取分位数很容易data[row].quantile(x)

但由于每一行实际上代表多个观察结果，我不确定如何获取分位数。

编辑：最重要的是，主要问题是销售数量不离散。这是一个连续变量。（我们就像谈论米、公斤等，因此不能选择创建更多行。）

Answer 1

Sim*_*wly 5

对于一组离散值，通过排序并取中心值来找到中位数。然而，由于您有的连续值Quantity，看起来您确实在寻找概率分布的中位数，其中的Price分布与给定的相对频率有关Quantity。通过对数据进行排序并进行累积Quantity，我们可以得出您的问题的图形表示：

您可以从此图中看到中值为 40（X 中点处的 y 值）。这是可以预料到的，因为以两个最低价格出售的数量非常大。中位数可以从您的数据框中计算出来，如下所示：

df = df.sort_values('Price')
cumul = df['Quantity Sold'].cumsum()
# Get the row index where the cumulative quantity reaches half the total.
total = df['Quantity Sold'].sum()
index = sum(cumul < 0.5 * total)
# Get the price at that index
result = df['Price'].iloc[index]

Run Code Online (Sandbox Code Playgroud)

可以使用不同的总数比率来计算相同数据的任何其他分位数。

归档时间：	8 年，2 月前
查看次数：	2884 次
最近记录：	6 年，6 月前