我想将以下csv导入为字符串而不是int64.Pandas read_csv自动将其转换为int64,但我需要将此列作为字符串.
ID
00013007854817840016671868
00013007854817840016749251
00013007854817840016754630
00013007854817840016781876
00013007854817840017028824
00013007854817840017963235
00013007854817840018860166
df = read_csv('sample.csv')
df.ID
>>
0 -9223372036854775808
1 -9223372036854775808
2 -9223372036854775808
3 -9223372036854775808
4 -9223372036854775808
5 -9223372036854775808
6 -9223372036854775808
Name: ID
Run Code Online (Sandbox Code Playgroud)
不幸的是使用转换器会产生相同的结
df = read_csv('sample.csv', converters={'ID': str})
df.ID
>>
0 -9223372036854775808
1 -9223372036854775808
2 -9223372036854775808
3 -9223372036854775808
4 -9223372036854775808
5 -9223372036854775808
6 -9223372036854775808
Name: ID
Run Code Online (Sandbox Code Playgroud) 我正在使用研究数据导入Pandas数据框read_csv.
我的主题代码是6个数字编码,其中包括出生日期.对于我的一些主题,这导致具有前导零的代码(例如"010816").
当我导入Pandas时,前导零被剥离,列被格式化为int64.
有没有办法可以将此列导入为字符串?
我尝试为列使用自定义转换器,但它不起作用 - 好像自定义转换发生在Pandas转换为int之前.
所以我正在阅读NOAA的电台代码csv文件,如下所示:
"USAF","WBAN","STATION NAME","CTRY","FIPS","STATE","CALL","LAT","LON","ELEV(.1M)","BEGIN","END"
"006852","99999","SENT","SW","SZ","","","+46817","+010350","+14200","",""
"007005","99999","CWOS 07005","","","","","-99999","-999999","-99999","20120127","20120127"
Run Code Online (Sandbox Code Playgroud)
前两列包含气象站的代码,有时它们有前导零.当pandas在没有指定dtype的情况下导入它们时,它们会变成整数.这并不是什么大不了的事,因为我可以遍历数据框索引并用类似的东西替换它们,"%06d" % i因为它们总是六位数,但是你知道......这就是懒人的方式.
使用以下代码获取csv:
file = urllib.urlopen(r"ftp://ftp.ncdc.noaa.gov/pub/data/inventories/ISH-HISTORY.CSV")
output = open('Station Codes.csv','wb')
output.write(file.read())
output.close()
Run Code Online (Sandbox Code Playgroud)
这一切都很好,但当我去尝试阅读它使用这个:
import pandas as pd
df = pd.io.parsers.read_csv("Station Codes.csv",dtype={'USAF': np.str, 'WBAN': np.str})
Run Code Online (Sandbox Code Playgroud)
要么
import pandas as pd
df = pd.io.parsers.read_csv("Station Codes.csv",dtype={'USAF': str, 'WBAN': str})
Run Code Online (Sandbox Code Playgroud)
我收到一条令人讨厌的错误消息:
File "C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\io\parsers.py", line 401, in parser
_f
return _read(filepath_or_buffer, kwds)
File "C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\io\parsers.py", line 216, in _read
return parser.read()
File "C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\io\parsers.py", line 633, in read
ret = self._engine.read(nrows)
File "C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\io\parsers.py", line 957, in read
data …Run Code Online (Sandbox Code Playgroud) 从txt文件读取数据后,有一个如下所示的数据帧(df1):
name l1 l2
a 00000 00000
b 00010 00002
c 00000 01218
Run Code Online (Sandbox Code Playgroud)
当我如下使用python代码时:
dataframe.to_csv('test.csv', index= False)
Run Code Online (Sandbox Code Playgroud)
然后,我使用以下代码来阅读:
df = pd.read_csv('test.csv')
Run Code Online (Sandbox Code Playgroud)
我发现数据框如下所示是df2
name l1 l2
a 0 0
b 10 2
c 0 1218
Run Code Online (Sandbox Code Playgroud)
但是我想像df1一样在数据帧中保留前导零。
谢谢!