Python导入csv列表

Mor*_*nTN 167 python csv

我有一个包含大约2000条记录的CSV文件.

每条记录都有一个字符串和一个类别.

This is the first line, Line1
This is the second line, Line2
This is the third line, Line3
Run Code Online (Sandbox Code Playgroud)

我需要将此文件读入一个看起来像这样的列表;

List = [('This is the first line', 'Line1'),
        ('This is the second line', 'Line2'),
        ('This is the third line', 'Line3')]
Run Code Online (Sandbox Code Playgroud)

如何将此导入csv到我需要使用Python的列表中?

Mac*_*Gol 279

使用csv模块(Python 2.x):

import csv
with open('file.csv', 'rb') as f:
    reader = csv.reader(f)
    your_list = list(reader)

print your_list
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]
Run Code Online (Sandbox Code Playgroud)

如果你需要元组:

import csv
with open('test.csv', 'rb') as f:
    reader = csv.reader(f)
    your_list = map(tuple, reader)

print your_list
# [('This is the first line', ' Line1'),
#  ('This is the second line', ' Line2'),
#  ('This is the third line', ' Line3')]
Run Code Online (Sandbox Code Playgroud)

Python 3.x版本(下面是@seokhoonlee)

import csv

with open('file.csv', 'r') as f:
  reader = csv.reader(f)
  your_list = list(reader)

print(your_list)
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]
Run Code Online (Sandbox Code Playgroud)

  • 这在Python 3.x中不起作用:"csv.Error:迭代器应该返回字符串,而不是字节(你是否以文本模式打开文件?)"请参阅下面的Python 3.x中可用的答案 (7认同)
  • @DrunkenMaster,`b`导致文件以二进制模式打开而不是文本模式.在某些系统上,文本模式意味着`\n`将在读取或写入时转换为特定于平台的新行.[见文档](https://docs.python.org/2/library/functions.html#open). (5认同)
  • 为什么使用'rb'而不是'r'? (4认同)
  • 为了节省几秒钟的调试时间,你应该为第一个解决方案添加一个注释,比如"Python 2.x版" (2认同)

seo*_*lee 50

Python3的更新:

import csv

with open('file.csv', 'r') as f:
  reader = csv.reader(f)
  your_list = list(reader)

print(your_list)
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]
Run Code Online (Sandbox Code Playgroud)


Mar*_*oma 38

熊猫非常擅长处理数据.以下是如何使用它的一个示例:

import pandas as pd

# Read the CSV into a pandas data frame (df)
#   With a df you can do many things
#   most important: visualize data with Seaborn
df = pd.read_csv('filename.csv', delimiter=',')

# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]

# or export it as a list of dicts
dicts = df.to_dict().values()
Run Code Online (Sandbox Code Playgroud)

一个很大的优点是pandas会自动处理标题行.

如果你还没有听说过Seaborn,我建议看一下.

另请参阅:如何使用Python读取和写入CSV文件?

熊猫#2

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
dicts = df.to_dict('records')
Run Code Online (Sandbox Code Playgroud)

df的内容是:

     country   population population_time    EUR
0    Germany   82521653.0      2016-12-01   True
1     France   66991000.0      2017-01-01   True
2  Indonesia  255461700.0      2017-01-01  False
3    Ireland    4761865.0             NaT   True
4      Spain   46549045.0      2017-06-01   True
5    Vatican          NaN             NaT   True
Run Code Online (Sandbox Code Playgroud)

dicts的内容是

[{'country': 'Germany', 'population': 82521653.0, 'population_time': Timestamp('2016-12-01 00:00:00'), 'EUR': True},
 {'country': 'France', 'population': 66991000.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': True},
 {'country': 'Indonesia', 'population': 255461700.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': False},
 {'country': 'Ireland', 'population': 4761865.0, 'population_time': NaT, 'EUR': True},
 {'country': 'Spain', 'population': 46549045.0, 'population_time': Timestamp('2017-06-01 00:00:00'), 'EUR': True},
 {'country': 'Vatican', 'population': nan, 'population_time': NaT, 'EUR': True}]
Run Code Online (Sandbox Code Playgroud)

熊猫#3

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
tuples = [[row[col] for col in df.columns] for row in df.to_dict('records')]
Run Code Online (Sandbox Code Playgroud)

内容tuples是:

[['Germany', 82521653.0, Timestamp('2016-12-01 00:00:00'), True],
 ['France', 66991000.0, Timestamp('2017-01-01 00:00:00'), True],
 ['Indonesia', 255461700.0, Timestamp('2017-01-01 00:00:00'), False],
 ['Ireland', 4761865.0, NaT, True],
 ['Spain', 46549045.0, Timestamp('2017-06-01 00:00:00'), True],
 ['Vatican', nan, NaT, True]]
Run Code Online (Sandbox Code Playgroud)


Alg*_*bra 6

Python3更新:

import csv
from pprint import pprint

with open('text.csv', newline='') as file:
reader = csv.reader(file)
l = list(map(tuple, reader))
pprint(l)
[('This is the first line', ' Line1'),
('This is the second line', ' Line2'),
('This is the third line', ' Line3')]
Run Code Online (Sandbox Code Playgroud)

如果csvfile是文件对象,则应使用打开newline=''
CSV模组


Miq*_*uel 5

如果你相信有您的输入没有逗号,以外的其他类别分开,你可以逐行读取文件中的行分裂,,然后推结果List

这就是说,它看起来像你正在寻找一个CSV文件,所以你可能会考虑使用该模块为它


小智 5

result = []
for line in text.splitlines():
    result.append(tuple(line.split(",")))
Run Code Online (Sandbox Code Playgroud)

  • 我知道 Barranka 的评论已经有一年多了,但是对于任何偶然发现这一点并且无法弄清楚的人:_for line in text.splitlines():_ 将每个单独的行放入临时变量“line”中。_line.split(",")_ 创建一个以逗号分割的字符串列表。_tuple(~)_ 将该列表放入元组中,然后 _append(~)_ 将其添加到结果中。循环之后,_result_ 是一个元组列表,每个元组一行,每个元组元素是 csv 文件中的一个元素。 (4认同)
  • 您能为这篇文章添加一些解释吗?仅代码(有时)很好,但代码和解释(大多数时候)更好 (2认同)

Kid*_*ddo 5

您可以使用该list()函数将 csv 读取器对象转换为列表

import csv

with open('input.csv', newline='') as csv_file:
    reader = csv.reader(csv_file, delimiter=',')
    rows = list(reader)
    print(rows)
Run Code Online (Sandbox Code Playgroud)