熊猫:如何获得熊猫系列中最常见的项目?

mom*_*ind 7 python series python-3.x pandas

如何获得pandas系列中最常用的项目?

考虑这个系列 s

s = pd.Series("1 5 3 3 3 5 2 1 8 10 2 3 3 3".split()).astype(int)
Run Code Online (Sandbox Code Playgroud)

返回的值应该是 3

jpp*_*jpp 7

您可以使用pd.Series.mode并提取第一个值:

res = s.mode().iloc[0]
Run Code Online (Sandbox Code Playgroud)

这不一定效率低下.与往常一样,测试您的数据,看看哪些适合.

import numpy as np, pandas as pd
from scipy.stats.mstats import mode
from collections import Counter

np.random.seed(0)

s = pd.Series(np.random.randint(0, 100, 100000))

def jez_np(s):
    _, idx, counts = np.unique(s, return_index=True, return_counts=True)
    index = idx[np.argmax(counts)]
    val = s[index]
    return val

def pir(s):
    i, r = s.factorize()
    return r[np.bincount(i).argmax()]

%timeit s.mode().iloc[0]                 # 1.82 ms
%timeit pir(s)                           # 2.21 ms
%timeit s.value_counts().index[0]        # 2.52 ms
%timeit mode(s).mode[0]                  # 5.64 ms
%timeit jez_np(s)                        # 8.26 ms
%timeit Counter(s).most_common(1)[0][0]  # 8.27 ms
Run Code Online (Sandbox Code Playgroud)


jez*_*ael 5

使用value_counts并选择第一个值index:

val = s.value_counts().index[0]
Run Code Online (Sandbox Code Playgroud)

或者Counter.most_common:

from collections import Counter

val = Counter(s).most_common(1)[0][0]
Run Code Online (Sandbox Code Playgroud)

或者是numpy解决方案:

_, idx, counts = np.unique(s, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
val = s[index]
Run Code Online (Sandbox Code Playgroud)