获取pandas布尔系列为True的索引列表

Question

获取pandas布尔系列为True的索引列表

Jam*_*own 14 python series pandas

我有一个带有布尔条目的熊猫系列.我想得到值为的索引列表True.

例如输入 pd.Series([True, False, True, True, False, False, False, True])

应该产生输出[0,2,3,7].

我可以用列表理解来做到这一点,但有更清洁或更快的东西吗？

Answer 1

raf*_*elc 29

运用 `Boolean Indexing`

>>> s = pd.Series([True, False, True, True, False, False, False, True])
>>> s[s].index
Int64Index([0, 2, 3, 7], dtype='int64')

Run Code Online (Sandbox Code Playgroud)

如果需要一个np.array物体,那就去吧.values

>>> s[s].index.values
array([0, 2, 3, 7])

Run Code Online (Sandbox Code Playgroud)

运用 `np.nonzero`

>>> np.nonzero(s)
(array([0, 2, 3, 7]),)

Run Code Online (Sandbox Code Playgroud)

运用 `np.flatnonzero`

>>> np.flatnonzero(s)
array([0, 2, 3, 7])

Run Code Online (Sandbox Code Playgroud)

运用 `np.where`

>>> np.where(s)[0]
array([0, 2, 3, 7])

Run Code Online (Sandbox Code Playgroud)

运用 `np.argwhere`

>>> np.argwhere(s).ravel()
array([0, 2, 3, 7])

Run Code Online (Sandbox Code Playgroud)

运用 `pd.Series.index`

>>> s.index[s]
array([0, 2, 3, 7])

Run Code Online (Sandbox Code Playgroud)

使用python的内置功能 `filter`

>>> [*filter(s.get, s.index)]
[0, 2, 3, 7]

Run Code Online (Sandbox Code Playgroud)

运用 `list comprehension`

>>> [i for i in s.index if s[I]]
[0, 2, 3, 7]

Run Code Online (Sandbox Code Playgroud)

如果系列索引有标签而不是索引范围怎么办？ (3认同)

Answer 2

Chr*_*yer 15

作为rafaelc 答案的补充，以下是以下设置的相应时间（从最快到最慢）

import numpy as np
import pandas as pd
s = pd.Series([x > 0.5 for x in np.random.random(size=1000)])

Run Code Online (Sandbox Code Playgroud)

使用 `np.where`

>>> timeit np.where(s)[0]
12.7 µs ± 77.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Run Code Online (Sandbox Code Playgroud)

使用 `np.flatnonzero`

>>> timeit np.flatnonzero(s)
18 µs ± 508 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Run Code Online (Sandbox Code Playgroud)

使用 `pd.Series.index`

布尔索引的时间差让我感到非常惊讶，因为布尔索引通常被更多地使用。

>>> timeit s.index[s]
82.2 µs ± 38.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Run Code Online (Sandbox Code Playgroud)

使用 `Boolean Indexing`

>>> timeit s[s].index
1.75 ms ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Run Code Online (Sandbox Code Playgroud)

如果您需要一个np.array对象，请获取.values

>>> timeit s[s].index.values
1.76 ms ± 3.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Run Code Online (Sandbox Code Playgroud)

如果您需要一个更容易阅读的版本 <-- 不在原始答案中

>>> timeit s[s==True].index
1.89 ms ± 3.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Run Code Online (Sandbox Code Playgroud)

使用`pd.Series.where`<-- 不在原始答案中

>>> timeit s.where(s).dropna().index
2.22 ms ± 3.32 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> timeit s.where(s == True).dropna().index
2.37 ms ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Run Code Online (Sandbox Code Playgroud)

使用`pd.Series.mask`<-- 不在原始答案中

>>> timeit s.mask(s).dropna().index
2.29 ms ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> timeit s.mask(s == True).dropna().index
2.44 ms ± 5.82 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Run Code Online (Sandbox Code Playgroud)

使用 `list comprehension`

>>> timeit [i for i in s.index if s[i]]
13.7 ms ± 40.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Run Code Online (Sandbox Code Playgroud)

使用python的内置 `filter`

>>> timeit [*filter(s.get, s.index)]
14.2 ms ± 28.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Run Code Online (Sandbox Code Playgroud)

使用`np.nonzero`<-- 对我来说开箱即用

>>> timeit np.nonzero(s)
ValueError: Length of passed values is 1, index implies 1000.

Run Code Online (Sandbox Code Playgroud)

使用`np.argwhere`<-- 对我来说开箱即用

>>> timeit np.argwhere(s).ravel()
ValueError: Length of passed values is 1, index implies 1000.

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，5 月前
查看次数：	8131 次
最近记录：	7 年，5 月前

获取pandas布尔系列为True的索引列表

运用 Boolean Indexing

运用 np.nonzero

运用 np.flatnonzero

运用 np.where

运用 np.argwhere

运用 pd.Series.index

使用python的内置功能 filter

运用 list comprehension

使用 np.where

使用 np.flatnonzero

使用 pd.Series.index

使用 Boolean Indexing

使用pd.Series.where<-- 不在原始答案中

使用pd.Series.mask<-- 不在原始答案中

使用 list comprehension

使用python的内置 filter

使用np.nonzero<-- 对我来说开箱即用

使用np.argwhere<-- 对我来说开箱即用

运用 `Boolean Indexing`

运用 `np.nonzero`

运用 `np.flatnonzero`

运用 `np.where`

运用 `np.argwhere`

运用 `pd.Series.index`

使用python的内置功能 `filter`

运用 `list comprehension`

使用 `np.where`

使用 `np.flatnonzero`

使用 `pd.Series.index`

使用 `Boolean Indexing`

使用`pd.Series.where`<-- 不在原始答案中

使用`pd.Series.mask`<-- 不在原始答案中

使用 `list comprehension`

使用python的内置 `filter`

使用`np.nonzero`<-- 对我来说开箱即用

使用`np.argwhere`<-- 对我来说开箱即用