arr*_*vis 5 python dataframe pandas
我认为这个确切的问题还没有得到回答,所以这里是。
我有一个 Pandas 数据框,我想选择 A 列或 B 列中包含字符串的所有行。
假设数据框如下所示:
d = {'id':["1", "2", "3", "4"],
'title': ["Horses are good", "Cats are bad", "Frogs are nice", "Turkeys are the best"],
'description':["Horse epitome", "Cats bad but horses good", "Frog fancier", "Turkey tome, not about horses"],
'tags':["horse, cat, frog, turkey", "horse, cat, frog, turkey", "horse, cat, frog, turkey", "horse, cat, frog, turkey"],
'date':["2019-01-01", "2019-10-01", "2018-08-14", "2016-11-29"]}
dataframe = pandas.DataFrame(d)
Run Code Online (Sandbox Code Playgroud)
这使:
id title description tag date
1 "Horses are good" "Horse epitome" "horse, cat" 2019-01-01
2 "Cats are bad" "Cats bad" "horse, cat" 2019-10-01
3 "Frogs are nice" "Frog fancier, horses good" "horse, frog" 2018-08-14
4 "Turkey are best" "Turkey tome" "turkey, horse" 2016-11-29
Run Code Online (Sandbox Code Playgroud)
假设我想创建一个新的数据框,其中包含列或列中带有字符串horse(忽略大写)title的行description,但不在列tag(或任何其他列)中。
结果应该是(第 2 行和第 4 行被删除):
id title description tag date
1 "Horses are good" "Horse epitome" "horse, cat" 2019-01-01
3 "Frogs are nice" "Frog fancier, horses good" "horse, frog" 2018-08-14
Run Code Online (Sandbox Code Playgroud)
我看过一栏的一些答案,例如:
dataframe[dataframe['title'].str.contains('horse')]
Run Code Online (Sandbox Code Playgroud)
但我不确定 (1) 如何向此语句添加多列以及 (2) 如何修改它,例如string.lower()删除字符串匹配的列值中的大写字母。
提前致谢!
如果要指定用于测试的列,一种可能的解决方案是连接所有列,然后使用Series.str.contains和进行测试case=False:
s = dataframe['title'] + dataframe['description']
df = dataframe[s.str.contains('horse', case=False)]
Run Code Online (Sandbox Code Playgroud)
或为每个列的条件,并通过逐位把它们连OR有|:
df = dataframe[dataframe['title'].str.contains('horse', case=False) |
dataframe['description'].str.contains('horse', case=False)]
Run Code Online (Sandbox Code Playgroud)
此外,如果要指定列列不测试链解决方案与按位AND与反转条件~for NOT MATCH:
df = dataframe[s.str.contains('horse', case=False) &
~dataframe['tags'].str.contains('horse', case=False)]
Run Code Online (Sandbox Code Playgroud)
对于第二个解决方案,()在所有列周围添加链接OR:
df = dataframe[(dataframe['title'].str.contains('horse', case=False) |
dataframe['description'].str.contains('horse', case=False)) &
~dataframe['tags'].str.contains('horse', case=False)]]
Run Code Online (Sandbox Code Playgroud)
编辑:
就像@WeNYoBen 评论的那样,您可以添加DataFrame.copy到结尾以防止SettingWithCopyWarning,例如:
s = dataframe['title'] + dataframe['description']
df = dataframe[s.str.contains('horse', case=False)].copy()
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3531 次 |
| 最近记录: |