Mik*_*and 1 python csv namedtuple python-3.x
在Python 3.3中使用namedtuple文档示例作为我的模板,我有以下代码来下载csv并将其转换为一系列namedtuple子类实例:
from collections import namedtuple
from csv import reader
from urllib.request import urlopen
SecurityType = namedtuple('SecurityType', 'sector, name')
url = 'http://bsym.bloomberg.com/sym/pages/security_type.csv'
for sec in map(SecurityType._make, reader(urlopen(url))):
print(sec)
Run Code Online (Sandbox Code Playgroud)
这引发了以下异常:
Traceback (most recent call last):
File "scrap.py", line 9, in <module>
for sec in map(SecurityType._make, reader(urlopen(url))):
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
Run Code Online (Sandbox Code Playgroud)
我知道问题是urlopen返回字节而不是字符串,我需要在某些时候解码输出.以下是我现在使用StringIO的方法:
from collections import namedtuple
from csv import reader
from urllib.request import urlopen
import io
SecurityType = namedtuple('SecurityType', 'sector, name')
url = 'http://bsym.bloomberg.com/sym/pages/security_type.csv'
reader_input = io.StringIO(urlopen(url).read().decode('utf-8'))
for sec in map(SecurityType._make, reader(reader_input)):
print(sec)
Run Code Online (Sandbox Code Playgroud)
这闻起来很有趣,因为我基本上迭代字节缓冲区,解码,重新缓冲,然后迭代新的字符串缓冲区.没有两次迭代,是否有更多的Pythonic方法可以做到这一点?
使用io.TextIOWrapper()的解码urllib响应:
reader_input = io.TextIOWrapper(urlopen(url), encoding='utf8', newline='')
Run Code Online (Sandbox Code Playgroud)
现在csv.reader传递的是与在文本模式下打开文件系统上的常规文件时完全相同的接口.
通过此更改,您的示例URL适用于Python 3.3.1:
>>> for sec in map(SecurityType._make, reader(reader_input)):
... print(sec)
...
SecurityType(sector='Market Sector', name='Security Type')
SecurityType(sector='Comdty', name='Calendar Spread Option')
SecurityType(sector='Comdty', name='Financial commodity future.')
SecurityType(sector='Comdty', name='Financial commodity generic.')
SecurityType(sector='Comdty', name='Financial commodity option.')
...
SecurityType(sector='Muni', name='ZERO COUPON, OID')
SecurityType(sector='Pfd', name='PRIVATE')
SecurityType(sector='Pfd', name='PUBLIC')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
Run Code Online (Sandbox Code Playgroud)
最后一行似乎产生空元组; 原作确实有一些只有逗号的行.