Joh*_*ier 23 python csv numpy python-3.x pandas
我在一个包中有一些csv文本数据,我想用read_csv读取.我是这样做的
from pkgutil import get_data
from StringIO import StringIO
data = read_csv(StringIO(get_data('package.subpackage', 'path/to/data.csv')))
但是,StringIO.StringIO在Python 3中消失,而io.StringIO只接受Unicode.有一个简单的方法吗?
编辑:以下似乎不起作用
import pandas as pd
import pkgutil
from io import StringIO
def get_data_file(pkg, path):
    f = StringIO()
    contents = unicode(pkgutil.get_data('pymc.examples', 'data/wells.dat'))
    f.write(contents)
    return f
wells = get_data_file('pymc.examples', 'data/wells.dat')
data = pd.read_csv(wells, delimiter=' ', index_col='id',
                   dtype={'switch': np.int8})
失败了
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 401, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 209, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 509, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 611, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 893, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "parser.pyx", line 441, in pandas._parser.TextReader.__cinit__ (pandas/src/parser.c:3940)
  File "parser.pyx", line 551, in pandas._parser.TextReader._get_header (pandas/src/parser.c:5096)
pandas._parser.CParserError: Passed header=0 but only 0 lines in file
DSM*_*DSM 30
以下在3.3中为我工作:
>>> import numpy as np, pandas as pd
>>> import io, pkgutil
>>> wells = pkgutil.get_data('pymc.examples', 'data/wells.dat')
>>> type(wells)
<class 'bytes'>
>>> df = pd.read_csv(io.BytesIO(wells), encoding='utf8', sep=" ", index_col="id", dtype={"switch": np.int8})
>>> df.head()
    switch  arsenic       dist  assoc  educ
id                                         
1        1     2.36  16.826000      0     0
2        1     0.71  47.321999      0     0
3        0     2.07  20.966999      0    10
4        1     1.15  21.486000      0    12
5        1     1.10  40.874001      1    14
[5 rows x 5 columns]
NB我必须手动放入wells.dat该位置,所以我不能发誓我正确地复制它并且没有终端空白,因为我删除了一些.但是传递read_csv一个BytesIO对象和一个编码参数应该可行.(实际上,没有它你可能会离开,但这是一个好习惯.   io.TextIOWrapper可能是另一种选择.)
Ped*_*ito 21
要传递string给pandas read_csv()函数,您可以使用io.StringIO,即:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("csv string..."))
| 归档时间: | 
 | 
| 查看次数: | 24884 次 | 
| 最近记录: |