读取CSV的单个列并存储在数组中

dh7*_*762 5 python csv pandas

从csv读取的最佳方法是什么,但只有一个特定的列,如title

ID | date|  title |
-------------------
  1|  2013|   abc |
  2|  2012|   cde |
Run Code Online (Sandbox Code Playgroud)

然后该列应存储在如下数组中:

data = ["abc", "cde"]
Run Code Online (Sandbox Code Playgroud)

这是我到目前为止,大熊猫:

data = pd.read_csv("data.csv", index_col=2)
Run Code Online (Sandbox Code Playgroud)

我已经调查了这个帖子.我还是得到了IndexError: list index out of range.

编辑:

它不是一张桌子,它的逗号分隔如下:

ID,date,title
1,2013,abc
2,2012,cde
Run Code Online (Sandbox Code Playgroud)

And*_*den 11

一个选项只是读取整个csv,然后选择一列:

data = pd.read_csv("data.csv")

data['title']  # as a Series
data['title'].values  # as a numpy array
Run Code Online (Sandbox Code Playgroud)

正如@dawg建议的那样,你可以使用usecols参数,如果你也使用squeeze参数来避免一些hackery扁平化值数组...

In [11]: titles = pd.read_csv("data.csv", sep=',', usecols=['title'], squeeze=True)

In [12]: titles  # Series
Out[12]: 
0    abc
1    cde
Name: title, dtype: object

In [13]: titles.values  # numpy array
Out[13]: array(['abc', 'cde'], dtype=object)
Run Code Online (Sandbox Code Playgroud)


daw*_*awg 5

你可以这样做:

>>> import pandas as pd
>>> from StringIO import StringIO
>>> txt='''\
... ID,date,title
... 1,2013,abc
... 2,2012,cde'''
>>> data=pd.read_csv(StringIO(txt), usecols=['title']).T.values.tolist()[0]
>>> data
['abc', 'cde']
Run Code Online (Sandbox Code Playgroud)

或者,假设您有一些空白:

txt='''\
ID,date,title
1,2013,abc
2,2012,cde
3,2014, 
4,2015,fgh'''
table=pd.read_csv(StringIO(txt), usecols=['title'])
print table
  title
0   abc
1   cde
2      
3   fgh
data=pd.read_csv(StringIO(txt), usecols=['title']).T.values.tolist()[0]
print data
['abc', 'cde', ' ', 'fgh']
Run Code Online (Sandbox Code Playgroud)

或者,如果您有可变数量的数据字段:

txt='''\
ID,date,title
1,2013,
2,2012,cde
3
4,2015,fgh'''

print pd.read_csv(StringIO(txt), usecols=['title'])
  title
0   NaN
1   cde
2   NaN
3   fgh

print pd.read_csv(StringIO(txt), usecols=['title']).T.values.tolist()[0]
[nan, 'cde', nan, 'fgh']
Run Code Online (Sandbox Code Playgroud)


dh7*_*762 5

最后,事情就简单多了:

import pandas as pd
data = pd.read_csv("mycsv.csv")
data.columns = ["ID", "date", "title"]
rawlist = list(data.title)
Run Code Online (Sandbox Code Playgroud)