在熊猫中过滤数据帧：使用条件列表

Question

在熊猫中过滤数据帧：使用条件列表

我有一个具有两个维度的熊猫数据框：'col1' 和 'col2'

我可以使用以下方法过滤这两列的某些值：

df[ (df["col1"]=='foo') & (df["col2"]=='bar')]

Run Code Online (Sandbox Code Playgroud)

有什么办法可以一次过滤两列吗？

我天真地尝试将数据框限制为两列，但我对等式第二部分的最佳猜测不起作用：

df[df[["col1","col2"]]==['foo','bar']]

Run Code Online (Sandbox Code Playgroud)

给我这个错误

ValueError: Invalid broadcasting comparison [['foo', 'bar']] with block values

Run Code Online (Sandbox Code Playgroud)

我需要这样做是因为列的名称以及设置条件的列数会有所不同

Answer 1

Ale*_*der 8

据我所知，Pandas 中没有办法让你做你想做的事。然而，尽管以下解决方案可能不是我最漂亮的，但您可以按如下方式压缩一组并行列表：

cols = ['col1', 'col2']
conditions = ['foo', 'bar']

df[eval(" & ".join(["(df['{0}'] == '{1}')".format(col, cond) 
   for col, cond in zip(cols, conditions)]))]

Run Code Online (Sandbox Code Playgroud)

字符串连接结果如下：

>>> " & ".join(["(df['{0}'] == '{1}')".format(col, cond) 
    for col, cond in zip(cols, conditions)])

"(df['col1'] == 'foo') & (df['col2'] == 'bar')"

Run Code Online (Sandbox Code Playgroud)

然后您可以使用它来eval有效地评估：

df[eval("(df['col1'] == 'foo') & (df['col2'] == 'bar')")]

Run Code Online (Sandbox Code Playgroud)

例如：

df = pd.DataFrame({'col1': ['foo', 'bar, 'baz'], 'col2': ['bar', 'spam', 'ham']})

>>> df
  col1  col2
0  foo   bar
1  bar  spam
2  baz   ham

>>> df[eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond)) 
            for col, cond in zip(cols, conditions)]))]
  col1 col2
0  foo  bar

Run Code Online (Sandbox Code Playgroud)

Answer 2

Mic*_*off 5

我想指出已接受答案的替代方案，因为eval这不是解决此问题所必需的。

from functools import reduce

df = pd.DataFrame({'col1': ['foo', 'bar', 'baz'], 'col2': ['bar', 'spam', 'ham']})
cols = ['col1', 'col2']
values = ['foo', 'bar']
conditions = zip(cols, values)

def apply_conditions(df, conditions):
    assert len(conditions) > 0
    comps = [df[c] == v for c, v in conditions]
    result = comps[0]
    for comp in comps[1:]:
        result &= comp
    return result

def apply_conditions(df, conditions):
    assert len(conditions) > 0
    comps = [df[c] == v for c, v in conditions]
    return reduce(lambda c1, c2: c1 & c2, comps[1:], comps[0])

df[apply_conditions(df, conditions)]

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，2 月前
查看次数：	5523 次
最近记录：	4 年，4 月前