什么是熊猫系列的分位数函数的反转？

Question

什么是熊猫系列的分位数函数的反转？

Man*_*gia 31 python quantile pandas

分位数函数给出了给定的大熊猫系列的分位数,

例如

s.quantile(0.9)是4.2

是否存在反函数(即累积分布),它找到值x

s.quantile(X)= 4

谢谢

Answer 1

fer*_*sjp 45

我和你一样有同样的问题!我找到了一种使用scipy得到分位数逆的简单方法.

#libs required
from scipy import stats
import pandas as pd
import numpy as np

#generate ramdom data with same seed (to be reproducible)
np.random.seed(seed=1)
df = pd.DataFrame(np.random.uniform(0,1,(10)), columns=['a'])

#quantile function
x = df.quantile(0.5)[0]

#inverse of quantile
stats.percentileofscore(df['a'],x)

Run Code Online (Sandbox Code Playgroud)

值得注意的是，如果您的序列中有NaN值，则得分函数的分位数和百分位数似乎不会以相同的方式对待它们，即，这些函数不是彼此完全相反的。 (3认同)
请注意，当分位数与值不精确对齐时，pandas 插值会导致结果不一致；例如，尝试`quantile(0.51)`，反之则不一样。 (2认同)
只需执行 y = stats.percentileofscore(df['a'].dropna(), x) 即可获得与 df['a].quantile(y) == x 匹配的逆数 (2认同)

Answer 2

ILo*_*ing 12

排序可能很昂贵,如果你寻找单个值,我猜你最好用它计算它:

s = pd.Series(np.random.uniform(size=1000))
( s < 0.7 ).astype(int).mean() # =0.7ish

Run Code Online (Sandbox Code Playgroud)

可能有一种方法可以避免使用int(bool)shenanigan.

这很聪明。(s < 0.7).mean() 适用于我的熊猫 0.23.0 (2认同)

Answer 3

toz*_*CSS 9

其中记录的百分比s小于x：

# Find the percentile of `x` in `s`
(s<x).mean()  # i.e., (s<x).sum()/len(s)

Run Code Online (Sandbox Code Playgroud)

就是这样。

您还可以在排序时使用pandas.Series.searchsorted ：s

s.searchsorted(x)/len(s)

Run Code Online (Sandbox Code Playgroud)

Answer 4

Ana*_*a 秀 8

从数学上讲，您试图找到CDF或返回s小于或等于的值或分位数的概率q：

F(q) = Pr[s <= q]

Run Code Online (Sandbox Code Playgroud)

可以使用 numpy 并尝试以下单行代码：

np.mean(s.to_numpy() <= q)

Run Code Online (Sandbox Code Playgroud)

Answer 5

Cal*_* Ku 7

刚刚遇到同样的问题。这是我的两分钱。

def inverse_percentile(arr, num):
    arr = sorted(arr)
    i_arr = [i for i, x in enumerate(arr) if x > num]

    return i_arr[0] / len(arr) if len(i_arr) > 0 else 1

Run Code Online (Sandbox Code Playgroud)

Answer 6

Mik*_*ike 5

我知道没有1-liner，但是您可以使用scipy实现：

import pandas as pd
import numpy as np
from scipy.interpolate import interp1d

# set up a sample dataframe
df = pd.DataFrame(np.random.uniform(0,1,(11)), columns=['a'])
# sort it by the desired series and caculate the percentile
sdf = df.sort('a').reset_index()
sdf['b'] = sdf.index / float(len(sdf) - 1)
# setup the interpolator using the value as the index
interp = interp1d(sdf['a'], sdf['b'])

# a is the value, b is the percentile
>>> sdf
    index         a    b
0      10  0.030469  0.0
1       3  0.144445  0.1
2       4  0.304763  0.2
3       1  0.359589  0.3
4       7  0.385524  0.4
5       5  0.538959  0.5
6       8  0.642845  0.6
7       6  0.667710  0.7
8       9  0.733504  0.8
9       2  0.905646  0.9
10      0  0.961936  1.0

Run Code Online (Sandbox Code Playgroud)

现在我们可以看到这两个函数彼此相反。

>>> df['a'].quantile(0.57)
0.61167933268395969
>>> interp(0.61167933268395969)
array(0.57)
>>> interp(df['a'].quantile(0.43))
array(0.43)

Run Code Online (Sandbox Code Playgroud)

interp也可以接受列表，一个numpy数组或一个pandas数据系列，真的是任何迭代器！

归档时间：	11 年，2 月前
查看次数：	12109 次
最近记录：	6 年，3 月前