Python Pandas根据另一个集合(集合)的成员资格选择行

Ray*_*Ray 2 python indexing set conditional-statements pandas

假设我的DataFrame构造如下:

import pandas
import numpy

column_names = ["name", "age", "score"]
names = numpy.random.choice(["Jorge", "Xavier", "Joaquin", "Juan", "Jose"], 50)
ages = numpy.random.randint(0, 100, 50)
scores = numpy.random.rand(50)
df = pandas.DataFrame.from_dict(dict(zip(column_names, [names, ages, scores])))
Run Code Online (Sandbox Code Playgroud)

上面的前10行DataFrame如下所示。

   age     name     score
0   15    Jorge  0.031380
1   44     Juan  0.373199
2   84   Xavier  0.999065
3   55     Juan  0.159873
4   55  Joaquin  0.211931
5   33     Juan  0.484350
6   22   Xavier  0.510276
7   86  Joaquin  0.490013
8    2     Jose  0.185086
9   51     Juan  0.979015
Run Code Online (Sandbox Code Playgroud)

我希望能够选择列的元素所属的name{"Xavier", "Joaquin"}。本能地,我在想类似的东西,df.iloc[df["name"] in {"Xavier", "Joaquin"}, :]但这不起作用。那么我该如何实现呢?

注意

我知道我可以通过以下方式实现这个特定示例

df.loc[numpy.logical_or(df["name"] == "Xavier", df["name"] == "Joaquin"), :]
Run Code Online (Sandbox Code Playgroud)

但这不是重点。这只是我的实际问题的简化示例。我DataFrame的身高为2,340,923,名称集names的大小为3,624,我想选择名称为名称集成员的行names

jez*_*ael 5

我认为您需要isin

print (df.loc[df["name"].isin(["Xavier", "Joaquin"]), :])
    age     name     score
1    66  Joaquin  0.767056
2    17  Joaquin  0.721369
7    53  Joaquin  0.209415
10    9   Xavier  0.394815
13   20  Joaquin  0.276596
14   17   Xavier  0.810725
15   76   Xavier  0.918273
17   91  Joaquin  0.974723
18   39   Xavier  0.869607
21    3   Xavier  0.200578
22   34  Joaquin  0.938018
23   90   Xavier  0.664387
26   51   Xavier  0.946753
28   49   Xavier  0.859911
30   22  Joaquin  0.602381
34    7   Xavier  0.759837
35   96  Joaquin  0.790691
39   13  Joaquin  0.599557
40   10   Xavier  0.563933
41   69   Xavier  0.983787
43   58   Xavier  0.542903
44    8  Joaquin  0.307106
45   77  Joaquin  0.330278
46   55  Joaquin  0.980077
47   12   Xavier  0.177509
49   15  Joaquin  0.590958
Run Code Online (Sandbox Code Playgroud)

它也可以set很好地工作:

names = set(["Xavier", "Joaquin"])
print (df.loc[df["name"].isin(names), :])

    age     name     score
1    66  Joaquin  0.767056
2    17  Joaquin  0.721369
7    53  Joaquin  0.209415
10    9   Xavier  0.394815
13   20  Joaquin  0.276596
14   17   Xavier  0.810725
15   76   Xavier  0.918273
17   91  Joaquin  0.974723
18   39   Xavier  0.869607
21    3   Xavier  0.200578
22   34  Joaquin  0.938018
23   90   Xavier  0.664387
26   51   Xavier  0.946753
28   49   Xavier  0.859911
30   22  Joaquin  0.602381
34    7   Xavier  0.759837
35   96  Joaquin  0.790691
39   13  Joaquin  0.599557
40   10   Xavier  0.563933
41   69   Xavier  0.983787
43   58   Xavier  0.542903
44    8  Joaquin  0.307106
45   77  Joaquin  0.330278
46   55  Joaquin  0.980077
47   12   Xavier  0.177509
49   15  Joaquin  0.590958
Run Code Online (Sandbox Code Playgroud)