如果数据帧中的任何列中包含子字符串列表中的任何值，则过滤行

Question

如果数据帧中的任何列中包含子字符串列表中的任何值，则过滤行

Joe*_*Joe 5 python substring dataframe pandas

假设我有一个数据框 df 为：

df = pd.DataFrame({'Index': [1, 2, 3, 4, 5],
                   'Name': ['A', 'B', 100, 'C', 'D'],
                   'col1': [np.nan, 'bbby', 'cccy', 'dddy', 'EEEEE'],
                   'col2': ['water', np.nan, 'WATER', 'soil', 'cold air'],
                   'col3': ['watermelone', 'hot AIR', 'air conditioner', 'drink', 50000],
                  'Results': [1000, 2000, 3000, 4000, 5000]})


Out

Index  Name  col1     col2         col3           Results
    1  A     NaN    water       watermelone        1000
    2  B     bbbY    NaN         hot AIR           2000
    3  100   cccY    water       air conditioner   3000
    4  C     dddf    soil        drink             4000
    5  D     EEEEE   cold air    50000             5000

Run Code Online (Sandbox Code Playgroud)

我有一个清单：matches = ['wat','air']

如何选择所有带有col1orcol2或col3包含iin的行matches？

预期输出：

Index  Name  col1     col2         col3           Results
    1  A     NaN     water       watermelone       1000
    2  B     bbbY    NaN         hot AIR           2000
    3  100   cccY    water       air conditioner   3000

    5  D     EEEEE   cold air    50000              5000

Run Code Online (Sandbox Code Playgroud)

Answer 1

Dav*_*son 0

您可以使用.T转置数据帧并按str.contains列检查值，然后转回（str.contains如果用分隔，也可以将多个值传递给|，这就是为什么我将列表更改为带有的字符串matches = '|'.join(matches)）。

转置数据框的好处是您可以使用按列的 pandas 方法，而不是循环行或长lambda x:列表理解。与答案This technique should have good performance相比：lambda xaxis=1

# df = df.set_index('Index')
matches = ['wat','air']
matches = '|'.join(matches)
df = df.reset_index(drop=True).T.fillna('')
df = df.T[[df[col].str.lower().str.contains(matches).values.any() for col in df.columns]]
df
Out[1]: 
  Name   col1      col2             col3
0    A            water      watermelone
1    B   bbbY                    hot AIR
2    B   cccY     water  air conditioner
4    D  EEEEE  cold air              eat

Run Code Online (Sandbox Code Playgroud)

嗨@David Erickson，这似乎是因为我的数据库中有浮点数，无法匹配字符串。我添加了`df = df.applymap(str)`。现在可以。非常感激！！！ (2认同)

归档时间：	5 年，4 月前
查看次数：	1099 次
最近记录：	3 年，5 月前