使用pandas选择以多个等价物为条件的行

Question

使用pandas选择以多个等价物为条件的行

我有一个pandas df,并想在这些方面完成一些事情(用SQL术语):

SELECT * FROM df WHERE column1 = 'a' OR column2 = 'b' OR column3 = 'c' etc.

Run Code Online (Sandbox Code Playgroud)

现在这适用于一个列/值对:

foo = df.loc[df['column']==value]

Run Code Online (Sandbox Code Playgroud)

但是,我不确定如何将其扩展为多个列/值对

Answer 1

EdC*_*ica 92

由于运算符优先级,您需要在括号中包含多个条件,并使用按位和(&)和或(|)运算符:

foo = df.ix[(df['column1']==value) | (df['columns2'] == 'b') | (df['column3'] == 'c')]

Run Code Online (Sandbox Code Playgroud)

如果您使用and或or,那么大熊猫可能会抱怨比较模糊不清.在这种情况下,我们不清楚我们是否正在比较条件中一系列中的每个值,如果只有1或全部但是1与条件匹配则意味着什么.这就是为什么你应该使用按位运算符或numpy np.all或np.any指定匹配条件.

还有查询方法:http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.query.html

但是存在一些局限性,主要与列名和索引值之间可能存在歧义的问题有关.

使用最新的熊猫版本,必须使用loc而不是ix (10认同)

Answer 2

Phi*_*oud 20

更简洁 - 但不一定更快 - 的方法是使用DataFrame.isin()和DataFrame.any()

In [27]: n = 10

In [28]: df = DataFrame(randint(4, size=(n, 2)), columns=list('ab'))

In [29]: df
Out[29]:
   a  b
0  0  0
1  1  1
2  1  1
3  2  3
4  2  3
5  0  2
6  1  2
7  3  0
8  1  1
9  2  2

[10 rows x 2 columns]

In [30]: df.isin([1, 2])
Out[30]:
       a      b
0  False  False
1   True   True
2   True   True
3   True  False
4   True  False
5  False   True
6   True   True
7  False  False
8   True   True
9   True   True

[10 rows x 2 columns]

In [31]: df.isin([1, 2]).any(1)
Out[31]:
0    False
1     True
2     True
3     True
4     True
5     True
6     True
7    False
8     True
9     True
dtype: bool

In [32]: df.loc[df.isin([1, 2]).any(1)]
Out[32]:
   a  b
1  1  1
2  1  1
3  2  3
4  2  3
5  0  2
6  1  2
8  1  1
9  2  2

[8 rows x 2 columns]

Run Code Online (Sandbox Code Playgroud)

Answer 3

rra*_*rra -1

最简单的方法来做到这一点

\n\n

如果这有帮助，请点击向上箭头！坦克斯！！

\n\n

students = [ (\'jack1\', \'Apples1\' , 341) ,\n             (\'Riti1\', \'Mangos1\'  , 311) ,\n             (\'Aadi1\', \'Grapes1\' , 301) ,\n             (\'Sonia1\', \'Apples1\', 321) ,\n             (\'Lucy1\', \'Mangos1\'  , 331) ,\n             (\'Mike1\', \'Apples1\' , 351),\n              (\'Mik\', \'Apples1\' , np.nan)\n              ]\n#Create a DataFrame object\ndf = pd.DataFrame(students, columns = [\'Name1\' , \'Product1\', \'Sale1\']) \nprint(df)\n\n\n    Name1 Product1  Sale1\n0   jack1  Apples1    341\n1   Riti1  Mangos1    311\n2   Aadi1  Grapes1    301\n3  Sonia1  Apples1    321\n4   Lucy1  Mangos1    331\n5   Mike1  Apples1    351\n6     Mik  Apples1    NaN\n\n# Select rows in above DataFrame for which \xe2\x80\x98Product\xe2\x80\x99 column contains the value \xe2\x80\x98Apples\xe2\x80\x99,\nsubset = df[df[\'Product1\'] == \'Apples1\']\nprint(subset)\n\n Name1 Product1  Sale1\n0   jack1  Apples1    341\n3  Sonia1  Apples1    321\n5   Mike1  Apples1    351\n6     Mik  Apples1    NA\n\n# Select rows in above DataFrame for which \xe2\x80\x98Product\xe2\x80\x99 column contains the value \xe2\x80\x98Apples\xe2\x80\x99, AND notnull value in Sale\n\nsubsetx= df[(df[\'Product1\'] == "Apples1")  & (df[\'Sale1\'].notnull())]\nprint(subsetx)\n    Name1   Product1    Sale1\n0   jack1   Apples1      341\n3   Sonia1  Apples1      321\n5   Mike1   Apples1      351\n\n# Select rows in above DataFrame for which \xe2\x80\x98Product\xe2\x80\x99 column contains the value \xe2\x80\x98Apples\xe2\x80\x99, AND Sale = 351\n\nsubsetx= df[(df[\'Product1\'] == "Apples1")  & (df[\'Sale1\'] == 351)]\nprint(subsetx)\n\n   Name1 Product1  Sale1\n5  Mike1  Apples1    351\n\n# Another example\nsubsetData = df[df[\'Product1\'].isin([\'Mangos1\', \'Grapes1\']) ]\nprint(subsetData)\n\nName1 Product1  Sale1\n1  Riti1  Mangos1    311\n2  Aadi1  Grapes1    301\n4  Lucy1  Mangos1    331\n\n

Run Code Online (Sandbox Code Playgroud)\n\n

这是我找到的原始链接。我稍微编辑了一下 - https://thispointer.com/python-pandas-select-rows-in-dataframe-by-conditions-on-multiple-columns/

\n

归档时间：	11 年，10 月前
查看次数：	114159 次
最近记录：	6 年，3 月前