我有一个pandas df,并想在这些方面完成一些事情(用SQL术语):
SELECT * FROM df WHERE column1 = 'a' OR column2 = 'b' OR column3 = 'c' etc.
Run Code Online (Sandbox Code Playgroud)
现在这适用于一个列/值对:
foo = df.loc[df['column']==value]
Run Code Online (Sandbox Code Playgroud)
但是,我不确定如何将其扩展为多个列/值对
EdC*_*ica 92
由于运算符优先级,您需要在括号中包含多个条件,并使用按位和(&)和或(|)运算符:
foo = df.ix[(df['column1']==value) | (df['columns2'] == 'b') | (df['column3'] == 'c')]
Run Code Online (Sandbox Code Playgroud)
如果您使用and或or,那么大熊猫可能会抱怨比较模糊不清.在这种情况下,我们不清楚我们是否正在比较条件中一系列中的每个值,如果只有1或全部但是1与条件匹配则意味着什么.这就是为什么你应该使用按位运算符或numpy np.all或np.any指定匹配条件.
还有查询方法:http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.query.html
但是存在一些局限性,主要与列名和索引值之间可能存在歧义的问题有关.
Phi*_*oud 20
更简洁 - 但不一定更快 - 的方法是使用DataFrame.isin()和DataFrame.any()
In [27]: n = 10
In [28]: df = DataFrame(randint(4, size=(n, 2)), columns=list('ab'))
In [29]: df
Out[29]:
a b
0 0 0
1 1 1
2 1 1
3 2 3
4 2 3
5 0 2
6 1 2
7 3 0
8 1 1
9 2 2
[10 rows x 2 columns]
In [30]: df.isin([1, 2])
Out[30]:
a b
0 False False
1 True True
2 True True
3 True False
4 True False
5 False True
6 True True
7 False False
8 True True
9 True True
[10 rows x 2 columns]
In [31]: df.isin([1, 2]).any(1)
Out[31]:
0 False
1 True
2 True
3 True
4 True
5 True
6 True
7 False
8 True
9 True
dtype: bool
In [32]: df.loc[df.isin([1, 2]).any(1)]
Out[32]:
a b
1 1 1
2 1 1
3 2 3
4 2 3
5 0 2
6 1 2
8 1 1
9 2 2
[8 rows x 2 columns]
Run Code Online (Sandbox Code Playgroud)
rra*_*rra -1
students = [ (\'jack1\', \'Apples1\' , 341) ,\n (\'Riti1\', \'Mangos1\' , 311) ,\n (\'Aadi1\', \'Grapes1\' , 301) ,\n (\'Sonia1\', \'Apples1\', 321) ,\n (\'Lucy1\', \'Mangos1\' , 331) ,\n (\'Mike1\', \'Apples1\' , 351),\n (\'Mik\', \'Apples1\' , np.nan)\n ]\n#Create a DataFrame object\ndf = pd.DataFrame(students, columns = [\'Name1\' , \'Product1\', \'Sale1\']) \nprint(df)\n\n\n Name1 Product1 Sale1\n0 jack1 Apples1 341\n1 Riti1 Mangos1 311\n2 Aadi1 Grapes1 301\n3 Sonia1 Apples1 321\n4 Lucy1 Mangos1 331\n5 Mike1 Apples1 351\n6 Mik Apples1 NaN\n\n# Select rows in above DataFrame for which \xe2\x80\x98Product\xe2\x80\x99 column contains the value \xe2\x80\x98Apples\xe2\x80\x99,\nsubset = df[df[\'Product1\'] == \'Apples1\']\nprint(subset)\n\n Name1 Product1 Sale1\n0 jack1 Apples1 341\n3 Sonia1 Apples1 321\n5 Mike1 Apples1 351\n6 Mik Apples1 NA\n\n# Select rows in above DataFrame for which \xe2\x80\x98Product\xe2\x80\x99 column contains the value \xe2\x80\x98Apples\xe2\x80\x99, AND notnull value in Sale\n\nsubsetx= df[(df[\'Product1\'] == "Apples1") & (df[\'Sale1\'].notnull())]\nprint(subsetx)\n Name1 Product1 Sale1\n0 jack1 Apples1 341\n3 Sonia1 Apples1 321\n5 Mike1 Apples1 351\n\n# Select rows in above DataFrame for which \xe2\x80\x98Product\xe2\x80\x99 column contains the value \xe2\x80\x98Apples\xe2\x80\x99, AND Sale = 351\n\nsubsetx= df[(df[\'Product1\'] == "Apples1") & (df[\'Sale1\'] == 351)]\nprint(subsetx)\n\n Name1 Product1 Sale1\n5 Mike1 Apples1 351\n\n# Another example\nsubsetData = df[df[\'Product1\'].isin([\'Mangos1\', \'Grapes1\']) ]\nprint(subsetData)\n\nName1 Product1 Sale1\n1 Riti1 Mangos1 311\n2 Aadi1 Grapes1 301\n4 Lucy1 Mangos1 331\n\nRun Code Online (Sandbox Code Playgroud)\n\n这是我找到的原始链接。我稍微编辑了一下 - https://thispointer.com/python-pandas-select-rows-in-dataframe-by-conditions-on-multiple-columns/
\n