熊猫:无法根据字符串相等进行过滤

Question

熊猫:无法根据字符串相等进行过滤

vpk*_*vpk 8 python string filtering selection pandas

在python 2.7,OSX上使用pandas 0.16.2.

我从csv文件中读取了一个数据帧,如下所示:

import pandas as pd

data = pd.read_csv("my_csv_file.csv",sep='\t', skiprows=(0), header=(0))

Run Code Online (Sandbox Code Playgroud)

输出data.dtypes是:

name       object
weight     float64
ethnicity  object
dtype: object

Run Code Online (Sandbox Code Playgroud)

我期待名字和种族的字符串类型.但我在这里找到了理由,说明为什么他们在新的熊猫版本中是"对象".

现在,我想根据种族选择行,例如:

data[data['ethnicity']=='Asian']
Out[3]: 
Empty DataFrame
Columns: [name, weight, ethnicity]
Index: []

Run Code Online (Sandbox Code Playgroud)

我用data[data.ethnicity=='Asian']或得到相同的结果data[data['ethnicity']=="Asian"].

但是,当我尝试以下内容时:

data[data['ethnicity'].str.contains('Asian')].head(3)

Run Code Online (Sandbox Code Playgroud)

我得到了我想要的结果.

但是,我不想使用"包含" - 我想检查直接相等.

请注意,data[data['ethnicity'].str=='Asian']引发错误.

难道我做错了什么？怎么做到这一点？

Answer 1

unu*_*tbu 9

例如,字符串中可能有空格

data = pd.DataFrame({'ethnicity':[' Asian', '  Asian']})
data.loc[data['ethnicity'].str.contains('Asian'), 'ethnicity'].tolist()
# [' Asian', '  Asian']
print(data[data['ethnicity'].str.contains('Asian')])

Run Code Online (Sandbox Code Playgroud)

产量

  ethnicity
0     Asian
1     Asian

Run Code Online (Sandbox Code Playgroud)

要从字符串中去除前导或尾随空格,您可以使用

data['ethnicity'] = data['ethnicity'].str.strip()

Run Code Online (Sandbox Code Playgroud)

之后,

data.loc[data['ethnicity'] == 'Asian']

Run Code Online (Sandbox Code Playgroud)

产量

  ethnicity
0     Asian
1     Asian

Run Code Online (Sandbox Code Playgroud)

Answer 2

Dan*_*tin 5

你可以试试这个：

data[data['ethnicity'].str.strip()=='Asian']

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，8 月前
查看次数：	15919 次
最近记录：	10 年，8 月前