pandas python中字符串的精确匹配

Question

pandas python中字符串的精确匹配

Abu*_*bul 5 regex excel python-2.7 pandas

我在数据框中有一列ex df：

  A
0 Good to 1. Good communication EI : tathagata.kar@ae.com
1 SAP ECC Project System  EI: ram.vaddadi@ae.com
2 EI : ravikumar.swarna  Role:SSE  Minimum Skill

Run Code Online (Sandbox Code Playgroud)

我有一个字符串列表

ls=['tathagata.kar@ae.com','a.kar@ae.com']

Run Code Online (Sandbox Code Playgroud)

现在如果我想过滤掉

for i in range(len(ls)):
   df1=df[df['A'].str.contains(ls[i])
        if len(df1.columns!=0):
            print ls[i]

Run Code Online (Sandbox Code Playgroud)

我得到输出

tathagata.kar@ae.com 
a.kar@ae.com

Run Code Online (Sandbox Code Playgroud)

但我只需要 tathagata.kar@ae.com

如何实现？正如你所看到的，我试过str.contains但我需要一些东西来进行精确匹配

Answer 1

L. *_*rez 3

你可以简单地使用==

string_a == string_b

Run Code Online (Sandbox Code Playgroud)

如果两个字符串相等，它应该返回 True。但这并不能解决你的问题。

编辑2：您应该使用 len(df1.index) 而不是 len(df1.columns)。事实上， len(df1.columns) 会给你列数，而不是行数。

编辑3：读完你的第二篇文章后，我明白了你的问题。您提出的解决方案可能会导致一些错误。例如，如果您有：

ls=['tathagata.kar@ae.com','a.kar@ae.com', 'tathagata.kar@ae.co']

Run Code Online (Sandbox Code Playgroud)

第一个和第三个元素将匹配 str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]) 这是一种不需要的行为。

您可以在字符串末尾添加检查： str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]+r'(?:\s| $)')

像这样：

for i in range(len(ls)):
  df1 = df[df['A'].str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]+r'(?:\s|$)')]
  if len(df1.index != 0):
      print (ls[i])

Run Code Online (Sandbox Code Playgroud)

（如果使用 python 2.7，请删除“print”中的括号）

归档时间：	8 年，5 月前
查看次数：	7549 次
最近记录：	6 年前