在 Pandas 列中搜索其他列中的子字符串

Question

在 Pandas 列中搜索其他列中的子字符串

Chr*_*ngs 4 python string substring dataframe pandas

我有一个示例.csv，导入为df.csv，如下：

    Ethnicity, Description
  0 French, Irish Dance Company
  1 Italian, Moroccan/Algerian
  2 Danish, Company in Netherlands
  3 Dutch, French
  4 English, EnglishFrench
  5 Irish, Irish-American

Run Code Online (Sandbox Code Playgroud)

我想检查 .pandastest1['Description']中的字符串 test1['Ethnicity']。这应该返回第 0、3、4 和 5 行，因为描述字符串包含种族列中的字符串。

到目前为止，我已经尝试过：

df[df['Ethnicity'].str.contains('French')]['Description']

Run Code Online (Sandbox Code Playgroud)

这将返回任何特定的字符串，但我想遍历而不搜索每个特定的种族值。我还尝试将列转换为列表并进行迭代，但似乎无法找到返回行的方法，因为它不再是 DataFrame()。

先感谢您！

Answer 1

jez*_*ael 5

您可以使用str.contains在列值Ethnicity转化tolist，然后join通过|是什么：regex or

print ('|'.join(df.Ethnicity.tolist()))
French|Italian|Danish|Dutch|English|Irish

mask = df.Description.str.contains('|'.join(df.Ethnicity.tolist()))
print (mask)
0     True
1    False
2    False
3     True
4     True
5     True
Name: Description, dtype: bool

#boolean-indexing
print (df[mask])
  Ethnicity          Description
0    French  Irish Dance Company
3     Dutch               French
4   English        EnglishFrench
5     Irish       Irish-American

Run Code Online (Sandbox Code Playgroud)

看起来你可以省略tolist()：

print (df[df.Description.str.contains('|'.join(df.Ethnicity))])
  Ethnicity          Description
0    French  Irish Dance Company
3     Dutch               French
4   English        EnglishFrench
5     Irish       Irish-American

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，4 月前
查看次数：	2479 次
最近记录：	9 年，4 月前