pandas dataframe str.contains()AND操作

Aer*_*rin 18 python string dataframe pandas

df(Pandas Dataframe)有三行.

some_col_name
"apple is delicious"
"banana is delicious"
"apple and banana both are delicious"
Run Code Online (Sandbox Code Playgroud)

df.col_name.str.contains("apple|banana")

将捕获所有行:

"apple is delicious",
"banana is delicious",
"apple and banana both are delicious".
Run Code Online (Sandbox Code Playgroud)

如何在str.contains方法上应用AND运算符,以便它只捕获包含苹果和香蕉的字符串?

"apple and banana both are delicious"
Run Code Online (Sandbox Code Playgroud)

我想抓住包含10-20个不同单词的字符串(葡萄,西瓜,浆果,橙子,......等)

fly*_*all 17

你可以这样做:

df[(df['col_name'].str.contains('apple')) & (df['col_name'].str.contains('banana'))]
Run Code Online (Sandbox Code Playgroud)


Ale*_*der 16

df = pd.DataFrame({'col': ["apple is delicious",
                           "banana is delicious",
                           "apple and banana both are delicious"]})

targets = ['apple', 'banana']

# Any word from `targets` are present in sentence.
>>> df.col.apply(lambda sentence: any(word in sentence for word in targets))
0    True
1    True
2    True
Name: col, dtype: bool

# All words from `targets` are present in sentence.
>>> df.col.apply(lambda sentence: all(word in sentence for word in targets))
0    False
1    False
2     True
Name: col, dtype: bool
Run Code Online (Sandbox Code Playgroud)


Anz*_*zel 11

您也可以使用正则表达式样式:

df[df['col_name'].str.contains(r'^(?=.*apple)(?=.*banana)')]
Run Code Online (Sandbox Code Playgroud)

然后,您可以将单词列表构建为正则表达式字符串,如下所示:

base = r'^{}'
expr = '(?=.*{})'
words = ['apple', 'banana', 'cat']  # example
base.format(''.join(expr.format(w) for w in words))
Run Code Online (Sandbox Code Playgroud)

将呈现:

'^(?=.*apple)(?=.*banana)(?=.*cat)'
Run Code Online (Sandbox Code Playgroud)

然后,您可以动态地进行操作。


小智 5

这有效

df.col.str.contains(r'(?=.*apple)(?=.*banana)',regex=True)
Run Code Online (Sandbox Code Playgroud)


Ser*_*rov 5

如果您只想使用本机方法并避免编写正则表达式,这里有一个不涉及 lambda 的矢量化版本:

targets = ['apple', 'banana', 'strawberry']
fruit_masks = (df['col'].str.contains(string) for string in targets)
combined_mask = np.vstack(fruit_masks).all(axis=0)
df[combined_mask]
Run Code Online (Sandbox Code Playgroud)