是否可以创建包含列表类型字段的pandas.DataFrame?
例如,我想将以下csv加载到pandas.DataFrame:
id,scores
1,"[1,2,3,4]"
2,"[1,2]"
3,"[0,2,4]"
Run Code Online (Sandbox Code Playgroud)
您可以使用:
import pandas as pd
import io
temp=u'''id,scores
1,"[1,2,3,4]"
2,"[1,2]"
3,"[0,2,4]"'''
df = pd.read_csv(io.StringIO(temp), sep=',', index_col=[0] )
print df
scores
id
1 [1,2,3,4]
2 [1,2]
3 [0,2,4]
Run Code Online (Sandbox Code Playgroud)
但列分数的数据类型是object,而不是列表。
一种方法使用ast和converters:
import pandas as pd
import io
from ast import literal_eval
temp=u'''id,scores
1,"[1,2,3,4]"
2,"[1,2]"
3,"[0,2,4]"'''
def converter(x):
#define format of datetime
return literal_eval(x)
#define each column
converters={'scores': converter}
df = pd.read_csv(io.StringIO(temp), sep=',', converters=converters)
print df
id scores
0 1 [1, 2, 3, 4]
1 2 [1, 2]
2 3 [0, 2, 4]
#check lists:
print 2 in df.scores[2]
#True
print 1 in df.scores[2]
#False
Run Code Online (Sandbox Code Playgroud)
删除双引号:
id,scores
1, [1,2,3,4]
2, [1,2]
3, [0,2,4]
Run Code Online (Sandbox Code Playgroud)
并且您应该能够执行以下操作:
query = [[1, [1,2,3,4]], [2, [1,2]], [3, [0,2,4]]]
df = pandas.DataFrame(query, columns=['id', 'scores'])
print df
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4275 次 |
| 最近记录: |