Gau*_*sal 6 python csv nan pandas
我的问题与此相关。我有一个名为“test.csv”的文件,其中“NA”作为 的值region
。我想将其读为“NA”,而不是“NaN”。但是,test.csv 中的其他列中缺少值,我想将其保留为“NaN”。我怎样才能做到这一点?
# test.csv looks like this:
Run Code Online (Sandbox Code Playgroud)
这是我尝试过的:
import pandas as pd
# This reads NA as NaN
df = pd.read_csv(test.csv)
df
region date expenses
0 NaN 1/1/2019 53
1 EU 1/2/2019 NaN
# This reads NA as NA, but doesn't read missing expense as NaN
df = pd.read_csv('test.csv', keep_default_na=False, na_values='_')
df
region date expenses
0 NA 1/1/2019 53
1 EU 1/2/2019
# What I want:
region date expenses
0 NA 1/1/2019 53
1 EU 1/2/2019 NaN
Run Code Online (Sandbox Code Playgroud)
添加参数的问题keep_default_na=False
是 的第二个值expenses
不会被读入 as NaN
。因此,如果我随后尝试,pd.isnull(df['value'][1])
则会返回为False
.
对我来说,这有效:
df = pd.read_csv('file.csv', keep_default_na=False, na_values=[''])
Run Code Online (Sandbox Code Playgroud)
这使:
region date expenses
0 NA 1/1/2019 53.0
1 EU 1/2/2019 NaN
Run Code Online (Sandbox Code Playgroud)
但我宁愿谨慎行事,因为NaN
其他专栏中可能有其他内容,并且这样做
df = pd.read_csv('file.csv')
df['region'] = df['region'].fillna('NA')
Run Code Online (Sandbox Code Playgroud)