我有2个数据框,一个名为USERS,另一个名为EXCLUDE.它们都有一个名为"email"的字段.
基本上,我想删除USERS中包含EXCLUDE中包含的电子邮件的每一行.
我该怎么做?
jez*_*ael 27
你可以使用boolean indexing和条件isin,反转布尔值Series是~:
import pandas as pd
USERS = pd.DataFrame({'email':['a@g.com','b@g.com','b@g.com','c@g.com','d@g.com']})
print (USERS)
     email
0  a@g.com
1  b@g.com
2  b@g.com
3  c@g.com
4  d@g.com
EXCLUDE = pd.DataFrame({'email':['a@g.com','d@g.com']})
print (EXCLUDE)
     email
0  a@g.com
1  d@g.com
Run Code Online (Sandbox Code Playgroud)
print (USERS.email.isin(EXCLUDE.email))
0     True
1    False
2    False
3    False
4     True
Name: email, dtype: bool
print (~USERS.email.isin(EXCLUDE.email))
0    False
1     True
2     True
3     True
4    False
Name: email, dtype: bool
print (USERS[~USERS.email.isin(EXCLUDE.email)])
     email
1  b@g.com
2  b@g.com
3  c@g.com
Run Code Online (Sandbox Code Playgroud)
另一个解决方案merge:
df = pd.merge(USERS, EXCLUDE, how='outer', indicator=True)
print (df)
     email     _merge
0  a@g.com       both
1  b@g.com  left_only
2  b@g.com  left_only
3  c@g.com  left_only
4  d@g.com       both
print (df.loc[df._merge == 'left_only', ['email']])
     email
1  b@g.com
2  b@g.com
3  c@g.com
Run Code Online (Sandbox Code Playgroud)
        小智 11
只是为了扩展jezrael的答案,可以使用相同的方法来基于多列过滤行。
USERS = pd.DataFrame({"email": ["a@g.com", "b@g.com", "c@g.com", 
                                "d@g.com", "e@g.com"],
                      "name": ["a", "s", "d", 
                               "f", "g"],
                      "nutrient_of_choice": ["pizza", "corn", "bread", 
                                             "coffee", "sausage"]})
print(USERS)    
     email name nutrient_of_choice
0  a@g.com    a              pizza
1  b@g.com    s               corn
2  c@g.com    d              bread
3  d@g.com    f             coffee
4  e@g.com    g            sausage
EXCLUDE = pd.DataFrame({"email":["x@g.com", "d@g.com"],
                        "name": ["a", "f"]})
print(EXCLUDE)
     email name
0  x@g.com    a
1  d@g.com    f
Run Code Online (Sandbox Code Playgroud)
现在,假设我们只想过滤具有匹配名称和电子邮件的行:
USERS = pd.merge(USERS, EXCLUDE, on=["email", "name"], how="outer", indicator=True)
print(USERS)
     email name nutrient_of_choice      _merge
0  a@g.com    a              pizza   left_only
1  b@g.com    s               corn   left_only
2  c@g.com    d              bread   left_only
3  d@g.com    f             coffee        both
4  e@g.com    g            sausage   left_only
5  x@g.com    a                NaN  right_only
USERS = USERS.loc[USERS["_merge"] == "left_only"].drop("_merge", axis=1)
print(USERS)
     email name nutrient_of_choice
0  a@g.com    a              pizza
1  b@g.com    s               corn
2  c@g.com    d              bread
4  e@g.com    g            sausage
Run Code Online (Sandbox Code Playgroud)
        |   归档时间:  |  
           
  |  
        
|   查看次数:  |  
           8033 次  |  
        
|   最近记录:  |